Chandan Singh | uncertainty


view markdown

some notes on uncertainty in machine learning, particularly deep learning


  • calibration - predicted probabilities should match real probabilities
    • platt scaling - given trained classifier and new calibration dataset, basically just fit a logistic regression from the classifier predictions -> labels
    • isotonic regression - nonparametric, requires more data than platt scaling
      • piecewise-constant non-decreasing function instead of logistic regression
  • ensemble uncertainty
  • neural network basic uncertainty: predicted probability = confidence, max margin, entropy of predicted probabilities across classes
  • Single-Model Uncertainties for Deep Learning (tagovska & lopez-paz 2019) - use simultaneous quantile regression
  • quantile regression - use quantile loss to penalize models differently + get confidence intervals
    • can easily do this with sklearn
    • quantile loss = $\begin{cases} \alpha \cdot \Delta & \text{if} \quad \Delta > 0\\(\alpha - 1) \cdot \Delta & \text{if} \quad \Delta < 0\end{cases}$
      • $\Delta =$ actual - predicted
      • Screen Shot 2019-06-26 at 10.06.11 AM


rejection learning

  • rejection learning - allow models to reject (not make a prediction) when they are not confidently accurate (chow 1957, cortes et al. 2016)
  • To Trust Or Not To Trust A Classifier (jiang, kim et al 2018) - find a trusted region of points based on nearest neighbor density (in some embedding space)
    • trust score uses density over some set of nearest neighbors
    • do clustering for each class - trust score = distance to once class’s cluster vs the other classes


  • complementarity - ML should focus on points hard for humans + seek human input on points hard for ML
    • note: goal of perception isn’t to learn categories but learn things that are associated with actions
  • Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer (madras et al. 2018) - adaptive rejection learning - build on rejection learning considering the strengths/weaknesses of humans
  • Learning to Complement Humans (wilder et al. 2020) - 2 approaches for how to incorporate human input:
    • discriminative approach - jointly train predictive model and policy for deferring to human (witha cost for deferring)
    • decision-theroetic approach - train predictive model + policy jointly based on value of information (VOI)
    • do real-world experiments w/ humans to validate: scientific discovery (a galaxy classification task) and medical diagnosis (detection of breast cancer metastasis)
  • Gaining Free or Low-Cost Transparency with Interpretable Partial Substitute (wang, 2019) - given a black-box model, find a subset of the data for which predictions can be made using a simple rule-list (tong wang has a few papers like this)

nearest-neighbor methods


  • overview from sklearn
  • elliptic envelope - assume data is Gaussian and fit elliptic envelop (maybe robustly) to tell when data is an outlier
  • isolation forest (liu et al. 2008) - lower average number of random splits required to isolate a sample means more outlier
  • one-class svm - estimates the support of a high-dimensional distribution using a kernel (2 approaches:)
    • separate the data from the origin (with max margin between origin and points) (scholkopf et al. 2000)
    • find a sphere boundary around a dataset with the volume of the sphere minimized (tax & duin 2004)
  • detachment index (kuenzel 2019) - based on random forest
    • for covariate $j$, detachment index $d^j(x) = \sum_i^n w (x, X_i) \vert X_i^j - x^j \vert$
      • $w(x, X_i) = \underbrace{1 / T\sum_{t=1}^{T}}{\text{average over T trees}} \frac{\overbrace{1{ X_i \in L_t(x) }}^{\text{is } X_i \text{ in the same leaf?}}}{\underbrace{\vert L_t(x) \vert}{\text{num points in leaf}}}$ is $X_i$ relevant to the point $x$?

predicting uncertainty for DNNs

bayesian approaches