Healthcare AI Models Should be Skewed & Localized – and That’s not Bias
This article discusses the significance of developing localized predictive models in healthcare and challenges the notion of a universal model.
On July 31, 2019, DeepMind published a research letter in Nature (via AI in Healthcare) about a clinically applicable approach to the continuous prediction of acute kidney injury (AKI). The predictive model was developed on a longitudinal dataset comprised of 703,782 adult patients across 172 inpatient and 1,062 outpatient sites. It "predicted more than half (55.8%) of all inpatient episodes of acute kidney injury and 90.2% of all acute kidney injuries that required dialysis. The lead-up time was 48 hours, and the model had a ratio of two false alerts for every true alert."
TechCrunch reported on the study in an article titled "DeepMind touts predictive health care AI ‘breakthrough’ trained on heavily skewed data." The article has a negative tone and discusses the NHS data-sharing controversy and other problematic aspects of DeepMind’s projects. However, the leading problem brought up about the paper is that the training data skewed overwhelmingly male (93.6%). According to TC, "This is because DeepMind’s AI was trained using patient data provided by the U.S. Department of Veteran Affairs (VA)."
HealthCare AI Models Should Be Skewed
Calling this out as a problem is a problem. Predictive models in health care should be skewed or, to use a more common term, localized. The population served by veterans administrations does skew heavily male — it also skews older than the general population and under-represents Hispanics, among other important differences. It is essential that a predictive model for that target population be trained with a dataset that resembles the actual population. Knowledge of prior distributions of key features is critical in classic Bayesian statistics, in deep learning and when specially applied to health care.
This same model would not be applicable for predicting AKI at Seattle’s Children’s Hospital, Miami’s Jackson Memorial or Cedars-Sinai in LA. But that’s not because the original model is skewed, it’s because each of these hospitals requires a model that’s skewed to them — to their population, to their clinical workflows and to their data collection practices. There are well-understood transfer learning techniques to do so.
Calling for building one model that would be representative of the whole population (of the U.S.? The UK? The whole world?) is a mistake. It simply fails in practice. Clinical models are applied at the hospitals that use them — unlike restaurant, dating or movie recommenders that should work for everyone who may try them equally.
What About Bias?
Bias in AI usually refers to differences in model accuracy between different groups. For example, it's been proven that machine learning models used to predict whether a criminal will re-offend is biased against blacks. The same model has substantially different accuracy scores for white defendants vs. black defendants. This would have a horrible effect on the criminal justice system, which must be colorblind.
AI models have been found to be biased against people of color in image recognition, against women in getting jobs, and against non-native speakers in speech recognition. This is likely just the tip of the iceberg. It’s unjust, unacceptable and, in some cases, illegal.
In health care, however, we do treat people differently based on who they are. Pediatric and geriatric medicine treat specific age groups. When we find genetic mutations that can help Asian or Jewish cancer patients, we celebrate the findings and don’t hide them. Choosing to practice women’s medicine is an act of specialization, not misandry. Diseases are biased — and so are treatments.
Going back to DeepMind’s AKI research, here is what it reported regarding gender and ethnicity: "In women, it predicted 44.8% of all AKI early, in men 56%, for those patients where gender was known. The model performance was higher on African American patients — 60.4% of AKIs detected early compared to 54.1% for all other ethnicities in aggregate."
If we have the chance to save 60.4% of male African America patients — more than what we currently do, and this can be further backed up in real clinical use — then it should become a recommended clinical guideline. If the 44.8% prediction rate for women does better in real use than current risk indicators, it should be adopted. The goal is to heal as many people as possible. People shouldn’t die while we’re figuring out how to get a model’s accuracy to be the same for every gender, age and ethnicity.
Compare this to the criminal justice bias example, where justice is the end goal and equality is a core tenet of justice. In such a system, the correct choice is to not use a biased model at all: It’s better that everyone gets a 20% accurate future crime prediction instead of having white and black people get 50% and 30% accuracy, respectively. The same stands for recruiting: It’s better to require corporations to hire more slowly (by not allowing them to prescreen candidates who are more likely to get hired) instead of increasing men’s chances to get hired over women even further than they already are.
Having A Better Public Discussion
Getting health care AI right is important. Mistakes don’t result in crappy playlists, irrelevant ads or having to repeat yourself to your speaker thrice. Mistakes result in needless death, pain and suffering. We must get things right when reporting or teaching it. Please contact me if you disagree — I’m working to get educated here, too.