ml in medicine view markdown
Some rough notes on ml in medicine
datasets
- physionet
- mimic-iv
- nih datasets
- mdcalc datasets
- pecarn
- openneuro
- clinicaltrials.gov - has thousands of active trials with long plain text description
- fairhealth - (paid) custom datasets that can include elements such as patients’ age and gender distribution, ICD-9 and ICD-10 procedure codes, geographic locations, professionals’ specialties and more
- other claims data is available but not clean
- prospero - website for registering systematic reviews / meta-analyses
nlp
- n2c2 tasks
- MedNLI - NLI task grounded in patient history (romanov & shivade, 2018)
- derived from Mimic, but expertly annotated
- i2b2 named entity recognition tasks
- i2b2 2006, 2010, 2012, 2014
- MedNLI - NLI task grounded in patient history (romanov & shivade, 2018)
- CASI dataset - collection of abbreviations and acronyms (short forms) with their possible senses (long forms), along with other corresponding information about these terms
- some extra annotations by agrawal…sontag, 2022
- PMC-Patients - open-source patient snippets, but no groundtruth labels besides age, gender
- EBM-NLP - annotates PICO (Participants, Interventions, Comparisons and Outcomes) spans in clinical trial abstracts
- task - identify the spans that describe the respective PICO elements
- review paper on clinical IE (wang…liu, 2017)
- mimic-iv-benchmark (xie…liu, 2022)
- 3 tabular datasets derived from MIMIC-IV ED EHR
- hospitalization (versus discharged) - met with an inpatient care site admission immediately following an ED visit
- critical - inpatient portality / transfer to an ICU within 12 hours
- reattendance - patient’s return visit to ED within 72 hours
- preprocessing for outliers / missing values (extended descriptions of variables here)
- patient history
- past ed visits, hospitalizations, icu admissions, comorbidities
- ICD codes give patients comorbidities (CCI charlson comorbitidy index, ECI elixhauser comorbidity index)
- info at triage
- temp., heart rate, pain scale, ESI, …
- Emergency severity index (ESI) - 5-level triage system assigned by nurse based on clinical judgments (1 is highest priority)
- top 10 chief complaints
- No neurological features (e.g. GCS)
- info before discharge
- vitalsigns
- edstays
- medication prescription
- patient history
- 3 tabular datasets derived from MIMIC-IV ED EHR
CDI bias
- Race/sex overviews
- Hidden in Plain Sight — Reconsidering the Use of Race Correction in Clinical Algorithms (vyas, eisenstein, & jones, 2020)
- Now is the Time for a Postracial Medicine: Biomedical Research, the National Institutes of Health, and the Perpetuation of Scientific Racism (2017)
- A Systematic Review of Barriers and Facilitators to Minority Research Participation Among African Americans, Latinos, Asian Americans, and Pacific Islanders (george, duran, & norris, 2014)
- The Use of Racial Categories in Precision Medicine Research (callier, 2019)
- Field Synopsis of Sex in Clinical Prediction Models for Cardiovascular Disease (paulus…kent, 2016) - supports the use of sex in predicting CVD, but not all CDIs use it
- Race Corrections in Clinical Models: Examining Family History and Cancer Risk (zink, obermeyer, & pierson, 2023) - family history variables mean different things for different groups depending on how much healthcare history their family had
- Hidden in Plain Sight — Reconsidering the Use of Race Correction in Clinical Algorithms (vyas, eisenstein, & jones, 2020)
- ML papers
- When Personalization Harms Performance: Reconsidering the Use of Group Attributes in Prediction (suriyakumar, ghassemi, & ustun, 2023) - group attributes to improve performance at a population level but often hurt at a group level
- Coarse race data conceals disparities in clinical risk score performance (movva…pierson, 2023)
- CDI guidelines
- Reporting and Methods in Clinical Prediction Research: A Systematic Review (Bouwmeester…moons, 2012) - review publications in 2008, mostly about algorithmic methodology
- Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement (collins…moons, 2015)
- Framework for the impact analysis and implementation of Clinical Prediction Rules (CPRs) (IDAPP group, 2011) - stress validating old rules
- Predictability and stability testing to assess clinical decision instrument performance for children after blunt torso trauma (kornblith…yu, 2022) - stress the use of stability, application to IAI
- Methodological standards for the development and evaluation of clinical prediction rules: a review of the literature (cowley…kemp, 2019)
- Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities (paulus & kent, 2020)
- Translating Clinical Research into Clinical Practice: Impact of Using Prediction Rules To Make Decisions (reilly & evans, 2006)
- Individual CDIs
- Reconsidering the Consequences of Using Race to Estimate Kidney Function. (eneanya, yang, & reese, 2019)
- Dissecting racial bias in an algorithm used to manage the health of populations (obermeyer et al. 2019) - for one algorithm, at a given risk score, Black patients are considerably sicker than White patients, as evidenced by signs of uncontrolled illnesses
- Race, Genetic Ancestry, and Estimating Kidney Function in CKD (CRIC, 2021)
- Prediction of vaginal birth after cesarean delivery in term gestations: a calculator without race and ethnicity (grobman et al. 2021)
- LLM bias
- Coding Inequity: Assessing GPT-4’s Potential for Perpetuating Racial and Gender Biases in Healthcare (zack…butte, alsentzer, 2023)
- biased outcomes
- On the Inequity of Predicting A While Hoping for B (mullainathan & obermeyer, 2021)
- Algorithm was specifically trained to predict health-care costs
- Because of structural biases and differential treatment, Black patients with similar needs to white patients have long been known to have lower costs
- real goal was to “determine which individuals are in need of specialized intervention programs and which intervention programs are likely to have an impact on the quality of individuals’ health.”
- Algorithm was specifically trained to predict health-care costs
- On the Inequity of Predicting A While Hoping for B (mullainathan & obermeyer, 2021)
ucsf de-id data
- black-box
- predict postoperative delirium prediction (bishara, …, donovan, 2022)
- intrepretable
- predict multiple sceloris by incorporating domain knowledge into biomedical knowledge graph (nelson, …, baranzini, 2022)
- predict mayo endoscopic subscores from colonoscopy reports (silverman, …, 2022)
- 3 types
- disease and patient categorization (e.g. classification)
- fundamental biological study
- treatment of patients
- philosophy
- want to focus on problems doctors can’t do
- alternatively, focus on automating problems parents can do to screen people at home in cost-effective way
- pathology - branch of medicine where you take some tissue from a patient (e.g. tumor), look at it under a microscope, and make an assesment of what the disease is
- websites are often easier than apps for patients
- The clinical artificial intelligence department: a prerequisite for success (cosgriff et al. 2020) - we need designated departments for clinical ai so we don’t have to rely on 3rd-party vendors and can test for things like distr. shift
- challenges in ai healthcare (news)
- adversarial examples
- things can’t be de-identified
- algorithms / data can be biased
- correlation / causation get confused
- healthcare is 20% of US GDP
- prognosis is a guess as to the outcome of treatment
- diagnosis is actually identifying the problem and giving it a name, such as depression or obsessive-compulsive disorder
- AI is a technology, but it’s not a product
- health economics incentives align with health incentives: catching tumor early is cheaper for hospitals
high-level
- focus on building something you want to deploy
- clinically useful - more efficient, cutting costs?
- effective - does it improve the current baseline
- focused on patient care - what are the unintended consequences
- need to think a lot about regulation
- USA: FDA
- Europe: CE (more convoluted)
- intended use
- very specific and well-defined
medical system
evaluation
- doctors are evaluated infrequently (and things like personal traits are often included)
- US has pretty good care but it is expensive per patient
- expensive things (e.g. Da Vinci robot)
- even if ml is not perfect, it may still outperform some doctors
- The impact of inconsistent human annotations on AI driven clinical decision making (sylolypavan…sim, 2023) - labels / majority vote are often very inconsistent
medical education
- rarely textbooks (often just slides)
- 1-2% miss rate for diagnosis can be seen as acceptable
- how doctors think
- 2 years: memorizing facts about physiology, pharmacology, and pathology
- 2 years learning practical applications for this knowledge, such as how to decipher an EKG and how to determine the appropriate dose of insulin for a diabetic
- little emphasis on metal logic for making a correct diagnosis and avoiding mistakes
- see work by pat croskerry
- there is limited data on misdiagnosis rates
- representativeness error - thinking is overly influenced by what is typically true
- availability error - tendency to judge the likelihood of an event by the ease with which relevant examples come to mind
- common infections tend to occur in epidemics, afflicting large numbers of people in a single community at the same time
- confirmation bias
- affective error - decisions based on what we wish were true (e.g. caring too much about patient)
- See one, do one, teach one - teaching axiom
political elements
- why doctors should organize
- big pharma
- day-to-day
- Doctors now face a burnout epidemic: thirty-five per cent of them show signs of high depersonalization
- according to one recent report, only thirteen per cent of a physician’s day, on average, is spent on doctor-patient interaction
- study during an average, eleven-hour workday, six hours are spent at the keyboard, maintaining electronic health records.
- medicare’s r.v.u - changes how doctors are reimbursed, emphasising procedural over cognitive things
- ai could help - make simple diagnoses faster, reduce paperwork, help patients manage their own diseases like diabetes
- ai could also make things worse - hospitals are mostly run by business people
medical communication
“how do doctors think?”
- easy to misinterpret things to be causal
- often no intuition for even relatively simple engineered features, such as averages
- doctors require context for features (e.g. this feature is larger than the average)
- often have some rules memorized (otherwise memorize what needs to be looked up)
- unclear how well doctors follow rules
- some rules are 1-way (e.g. only follow it if it says there is danger, otherwise use your best judgement)
- 2-way rules are better
- without proper education 1-way rules can be dangerously used as 2-way rules
- doesn’t make sense to judge 1-way rules on both sepcificity and sensitivity
- rules are often ambiguous (e.g. what constitutes vomiting)
- doctors adapt to personal experience - may be unfair to evaluate them on larger dataset
- sometimes said that doctors know 10 medications by heart
- Overconfidence in Clinical Decision Making (croskerry 2008)
- most uncertainty: family medicine [FM] and emergency medicine [EM]
- some uncertainty: internal medicine
- little uncertainty: specialty disciplines
- 2 systems at work: intuitive (uses context, heuristics) vs analytic (systematic, rule-based)
- a combination of both performs best
- doctors are often black boxes as well - validated infrequently, unclear how closely they follow rules
- doctors adapt to local conditions - should be evaluated only on local dataset
- potential liabilities for physicians using ai (price et al. 2019)
- What’s the trouble. How doctors think. New Yorker. 2007
- JAMA Users’ Guide to the Medical Literature
- TRIPOD 22 points paper
- basic stats in the step1 exam
- How to Read Articles That Use Machine Learning: Users’ Guides to the Medical Literature (liu et al. 2019
- Carmelli et al. 2018 - primer for CDRs but also a good example of what sort of article I have envisioned creating.
- Looking through the retrospectoscope: reducing bias in emergency medicine chart review studies. (kaji et al. 2018)
communicating findings
- don’t use ROC curves, use deciles
- need to evaluate use, not just metric
- internal/external validity = training/testing error
- model -> fitted model
- retrospective (more confounding, looks back) vs prospective study
- internal/external validity = train/test (although external was usually using different patient population, so is stronger)
- specificity/sensitivity = precision/recall
examples
succesful examples of ai in medicine
- ECG (NEJM, 1991)
- EKG has a small interpretation on it
- there used to be bayesian networks / expert systems but they went away…
icu interpretability example
- goal: explain the model not the patient (that is the doctor’s job)
- want to know interactions between features
- some features are difficult to understand
- e.g. max over this window, might seem high to a doctor unless they think about it
- some features don’t really make sense to change (e.g. was this thing measured)
- doctors like to see trends - patient health changes over time and must include history
- feature importance under intervention
high-performance ai studies
- chest-xray: chexnet
- echocardiograms: madani, ali, et al. 2018
- skin: esteva, andre, et al. 2017
- pathology: campanella, gabriele, et al.. 2019
- mammogram: kerlikowske, karla, et al. 2018
medical imaging
- Medical Imaging and Machine Learning
- medical images often have multiple channels / are 3d - closer to video than images
improving medical studies
- Machine learning methods for developing precision treatment rules with observational data (Kessler et al. 2019)
- goal: find precision treatment rules
- problem: need large sample sizes but can’t obtain them in RCTs
- recommendations
- screen important predictors using large observational medical records rather than RCTs
- important to do matching / weighting to account for bias in treatment assignments
- alternatively, can look for natural experiment / instrumental variable / discontinuity analysis
- has many benefits
- modeling: should use ensemble methods rather than individual models
- screen important predictors using large observational medical records rather than RCTs