ml in medicine

view markdown

some rough notes on ml in medicine


  • 3 types
    • disease and patient categorization (e.g. classification)
    • fundamental biological study
    • treatment of patients
  • philosophy
    • want to focus on problems doctors can’t do
    • alternatively, focus on automating problems parents can do to screen people at home in cost-effective way
  • pathology - branch of medicine where you take some tissue from a patient (e.g. tumor), look at it under a microscope, and make an assesment of what the disease is
  • websites are often easier than apps for patients
  • The clinical artificial intelligence department: a prerequisite for success (cosgriff et al. 2020) - we need designated departments for clinical ai so we don’t have to rely on 3rd-party vendors and can test for things like distr. shift
  • challenges in ai healthcare (news)
    • adversarial examples
    • things can’t be de-identified
    • algorithms / data can be biased
    • correlation / causation get confused
  • healthcare is 20% of US GDP
  • prognosis is a guess as to the outcome of treatment
  • diagnosis is actually identifying the problem and giving it a name, such as depression or obsessive-compulsive disorder
  • AI is a technology, but it’s not a product
  • health economics incentives align with health incentives: catching tumor early is cheaper for hospitals


  • focus on building something you want to deploy
    • clinically useful - more efficient, cutting costs?
    • effective - does it improve the current baseline
    • focused on patient care - what are the unintended consequences
  • need to think a lot about regulation
    • USA: FDA
    • Europe: CE (more convoluted)
  • intended use
    • very specific and well-defined


medical system


  • doctors are evaluated infrequently (and things like personal traits are often included)
  • US has pretty good care but it is expensive per patient
  • expensive things (e.g. Da Vinci robot)
  • even if ml is not perfect, it may still outperform some doctors

medical education

  • rarely textbooks (often just slides)
  • 1-2% miss rate for diagnosis can be seen as acceptable
  • how doctors think
    • 2 years: memorizing facts about physiology, pharmacology, and pathology
    • 2 years learning practical applications for this knowledge, such as how to decipher an EKG and how to determine the appropriate dose of insulin for a diabetic
    • little emphasis on metal logic for making a correct diagnosis and avoiding mistakes
    • see work by pat croskerry
    • there is limited data on misdiagnosis rates
    • representativeness error - thinking is overly influenced by what is typically true
    • availability error - tendency to judge the likelihood of an event by the ease with which relevant examples come to mind
      • common infections tend to occur in epidemics, afflicting large numbers of people in a single community at the same time
      • confirmation bias
    • affective error - decisions based on what we wish were true (e.g. caring too much about patient)
    • See one, do one, teach one - teaching axiom

political elements

  • why doctors should organize
  • big pharma
  • day-to-day
    • Doctors now face a burnout epidemic: thirty-five per cent of them show signs of high depersonalization
    • according to one recent report, only thirteen per cent of a physician’s day, on average, is spent on doctor-patient interaction
    • study during an average, eleven-hour workday, six hours are spent at the keyboard, maintaining electronic health records.
    • medicare’s r.v.u - changes how doctors are reimbursed, emphasising procedural over cognitive things
    • ai could help - make simple diagnoses faster, reduce paperwork, help patients manage their own diseases like diabetes
    • ai could also make things worse - hospitals are mostly run by business people

medical communication

“how do doctors think?”

communicating findings

  • don’t use ROC curves, use deciles
  • need to evaluate use, not just metric
  • internal/external validity = training/testing error
  • model -> fitted model
  • retrospective (more confounding, looks back) vs prospective study
  • internal/external validity = train/test (although external was usually using different patient population, so is stronger)
  • specificity/sensitivity = precision/recall


succesful examples of ai in medicine

  • ECG (NEJM, 1991)
  • EKG has a small interpretation on it
  • there used to be bayesian networks / expert systems but they went away…

icu interpretability example

  • goal: explain the model not the patient (that is the doctor’s job)
  • want to know interactions between features
  • some features are difficult to understand
    • e.g. max over this window, might seem high to a doctor unless they think about it
  • some features don’t really make sense to change (e.g. was this thing measured)
  • doctors like to see trends - patient health changes over time and must include history
  • feature importance under intervention

high-performance ai studies

  • chest-xray: chexnet
  • echocardiograms: madani, ali, et al. 2018
  • skin: esteva, andre, et al. 2017
  • pathology: campanella, gabriele, et al.. 2019
  • mammogram: kerlikowske, karla, et al. 2018

medical imaging

improving medical studies

  • Machine learning methods for developing precision treatment rules with observational data (Kessler et al. 2019)
    • goal: find precision treatment rules
    • problem: need large sample sizes but can’t obtain them in RCTs
    • recommendations
      • screen important predictors using large observational medical records rather than RCTs
        • important to do matching / weighting to account for bias in treatment assignments
        • alternatively, can look for natural experiment / instrumental variable / discontinuity analysis
        • has many benefits
      • modeling: should use ensemble methods rather than individual models



  • pathologists work with tissue samples either visually or chemically
    • anatomic pathology relies on the microscope whereas clinical pathology does not
  • pathologists convert from tissue image into written report
  • when case is challenging, may require a second opinion (v rare)
  • steps (process takes 9-12 hrs): tissue_prep
    • tissue is surgically removed
      • more tissue collected is generally better (gives more context)
      • this procedure is called a biopsy
      • much is written down at this step (e.g. race, gender, locations in organ, different tumors in an organ) that can’t be seen in slide alone
    • fixation: keeps the tissue stable (preserves dna also) - basicallly just soak in formalin
    • dissection: remove the relevant part of the tissue
    • tissue processor - removes water in tissue and substitute with wax (parafin) - hardens it and makes it easy to cut into thin strips
    • microtone - cuts very thin slices of the tissue (2-3 microns)
    • staining
      • H & E - hematoxylin and eosin stain - most popular (~80%) - colors the cells in a specific way, bc cells are usually pretty transparent
        • hematoxylin stains nucleic acids blue
        • eosin stains proteins / cytoplasm pink/red
      • immunohistochemistry (IHC) - tries to identify cell lineage: 10-15%
        • identifies targets
        • use antibodies tagged with chromophores to tag tissues
      • gram stain - highlights bacteria
      • giemsa - microorganisms
      • others…for muscle, fungi
    • viewing
      • usually analog - put slide on something that can move / rotate
      • whole-slide image (WSI) - resulting entire slide
        • tissue microarray (TMA) - smaller, fits many samples onto the same slide
      • with paige: put slide through digital scanner (only 5% or so of slides are currently digital)
    • later on, board meets to decide on treatment (based on pathology report)
      • usually some discussion betweeon original imaging (pre-biopsy) and pathologist’s interpretation
    • resection - after initial diagnosis, often entire tumor is removed (resection)
  • how can ai help?
    • can help identify small things in large images
    • can help with conflict resolution
  • after (successful) neoadjuvant chemotherapy, problem becomes more difficult
    • very few remaining cancer cells
    • cancer/non-cancer cells become harder to distinguish (esp. for prostate)
    • tumor bed is patchily filled with cancer cells - need to better clarify presence of cancer


  • Deep Learning Models for Digital Pathology (BenTaieb & Hamarneh, 2019)
    • note: alternative to histopathology are more expensive / slower (e.g. molecular profiling)
    • to promote consistency and objective inter-observer agreement, most pathologists are trained to follow simple algorithmic decision rules that sufficiently stratify patients into reproducible groups based on tumor type and aggressiveness
    • magnification usually given in microns per pixel
    • WSI files are much larger than other digital images (e.g. for radiology)
    • DNNs can be used for many tasks: beyond just classification, there are subtasks (e.g. count histological primitives, like nuclei) and preprocessing tasks (e.g. stain normalization)
    • challenge: multi-magnification + high dimensions (i.e. millions of pixels)
      • people usually extract smaller patches and train on these
        • this loses larger context
        • one soln: pyramid representation: extract patches at different magnification levels
        • one soln: stacked CNN - train fully-conv net, then remove linear layer, freeze, and train another fully-conv net on the activations (so it now has larger receptive field)
        • one soln: use 2D LSTM to aggregate patch reprs.
      • challenge: annotations only at the entire-slide level, but must figure out how to train individual patches
        • e.g. use aggregation techniques on patches - extract patch-wise features then do smth simple, like random forest
        • e.g. treat as weak labels or do multiple-instance learning
          • could just give slide-level label to all patches then vote
      • can use transfer learning from related domains with more labels
    • challenge: class imbalance
      • can use boosting approach to increase the likelihood of sampling patches that were originally incorrectly classified by the model
    • challenge: need to integrate in other info, such as genomics
    • when predicting histological primitives, often predict pixel-wise probability maps, then look for local maxima
      • can also integrated domain-knowledge features
      • can also have 2 paths, one making bounding-box proposals and another predicting the probability of a class
      • alternatively, can formulate as a regression task, where pixelwise prediction tells distance to nearest centroid of object
      • could also just directly predict the count
    • can also predict survival analysis
  • Clinical-grade computational pathology using weakly supervised deep learning on whole slide images (campanella et al. 2019)
    • use slide-level diagnosis as “weak supervision” for all contained patches
    • 1st step: train patch-level CNNs using MIL
      • if label is 0, then all patches should be 0
      • if label is 1, then only pass gradients to the top-k predicted patches
    • 2nd step: use RNN (or another net) to combine info across S most suspicious tiles
  • Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes (diao et al. 21)
  • An artificial intelligence algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies: a blinded clinical validation and deployment study (pantanowitz et al. 2020 - ibex)
    • 549 train, 2501 internal test slides, 1627 external validation
    • predict cancer prob., gleason score 7-10, gleason pattern 5, perneural invasion, cancer percentage
    • algorithm
      • GB classifies background / non-background / blurry using hand-extracted features for each tile
      • each tile gets predicted probability for 18 pre-defined classes (e.g. GP 3)
        • ensemble of 3 CNNs that operate at different magnifications
      • aggregation: 18-probability heatmaps are combined to calculate slide-level scores
        • ex (for predicting cancer): sum the cancer-related channels in the heatmap , apply 2x2 local averaging, then take max


  • ARCH - multiple instance captioning dataset to facilitate dense supervision of CP tasks



  • tumor = neoplasm - a mass formation from an uncontrolled growth of cells
    • benign tumor - typically stays confined to the organ where it is present and does not cause functional damage
    • malignant tumor = cancer - comprises organ function and can spread to other organs (metastasis)
  • relation network based aggregator on patches
  • lymphatic system drains fluids (non-blood) from organs into lymph nodes
    • cancer often mestastasize through these
  • staging - describes where cancer is located and where it has spread
    • clinical staging - based on non-tissue things
    • pathological staging - elements of staging pTNM
      • size / depth of tumor “T”
      • number of lymph nodes / how many had cancer “N”
      • number of metastatic foci in non-lymph node organ “M”
      • these are combined to determine the cancer stage (0-4)
  • prognosis - chance of recovery


  • chemo
    • traditional chemotherapy disrupts cell replication
      • hair loss and gastrointestinal symptoms occur bc these cells also rapidly replicate
    • adjuvant chemotherapy - after cancer is removed, most common
    • neoadjuvant chemo - after biopsy, but before resection (when very hard to remove)
  • targeted therapies
    • ex. address genetic aberration found in cancer cells
    • immunotherapy - enhance body’s immune response to cancer cells (so body will attack these cells on its own)
      • want the antigens on the tumor to be as different as possible (so they will be characterized as foreign)
      • to measure this, can conduct total mutational burden (TMB) or miscrosatellite instability (MSI) test
        • genetic tests - hard to do by looking at glass slide
      • some tumors express receptors (e.g. CTLA4, PD1) that shut off immune cells - some drugs try to block these receptors

prostate cancer

bladder cancer

H & E slide

  • shape:
papillary flat can also have a combo
pap_blad flat_blad  
  • grade:
low high
low_grade_blad high_grade_blad
  • when shape is flat, grade often can’t be determined reliably
    • lots of names for uncertain (e.g. upump - uncertain malignant potential, or atypia)
  • much easier to decide shape than grade
  • once you find high grade, look for invasiveness (and deeper layers are worse)