transfer learning

Domain test bed available here, for generalizating to new domains (i.e. performing well on domains that differ from previous seen data)

• Empirical Risk Minimization (ERM, Vapnik, 1998) - standard training
• Invariant Risk Minimization (IRM, Arjovsky et al., 2019) - learns a feature representation such that the optimal linear classifier on top of that representation matches across domains.
• Group Distributionally Robust Optimization (GroupDRO, Sagawa et al., 2020) - ERM + increase importance of domains with larger errors (see also papers from Sugiyama group e.g. 1, 2)
• Variance Risk Extrapolation (VREx, Krueger et al., 2020) - encourages robustness over affine combinations of training risks, by encouraging strict equality between training risks
• Interdomain Mixup (Mixup, Yan et al., 2020) - ERM on linear interpolations of examples from random pairs of domains + their labels
• Marginal Transfer Learning (MTL, Blanchard et al., 2011-2020) - augment original feature space with feature vector marginal distributions and then treat as a supervised learning problem
• Meta Learning Domain Generalization (MLDG, Li et al., 2017) - use MAML to meta-learn how to generalize across domains
• learning more diverse predictors
• Representation Self-Challenging (RSC, Huang et al., 2020) - adds dropout-like regularization to important features, forcing model to depend on many features
• Spectral Decoupling (SD, Pezeshki et al., 2020) - regularization which forces model to learn more predictive features, even when only a few suffice
• embedding prior knowledge
• Style Agnostic Networks (SagNet, Nam et al., 2020) - penalize style features (assumed to be spurious)
• Penalizing explanations (Rieger et al. 2020) - penalize spurious features using prior knowledge
• Domain adaptation under structural causal models (chen & buhlmann, 2020)
• make clearer assumptions for domain adaptation to work
• introduce CIRM, which works better when both covariates and labels are perturbed in target data
• kernel approach (blanchard, lee & scott, 2011) - find an appropriate RKHS and optimize a regularized empirical risk over the space

## domain invariance

key idea: want repr. to be invariant to domain label

## dynamic selection

Dynamic Selection (DS) refers to techniques in which, for a new test point, pre-trained classifiers are selected/combined from a pool at test time review paper (cruz et al. 2018), python package

1. define region of competence
1. clustering
2. kNN - more refined than clustering
3. decision space - e.g. a model’s classification boundary, internal splits in a model
4. potential function - weight all the points (e.g. by their distance to the query point)
2. criteria for selection
1. individual scores: acc, prob. behavior, rank, meta-learning, complexity
2. group: data handling, ambiguity, diversity
3. combination
1. non-trainable: mean, majority vote, product, median, etc.
2. trainable: learn the combination of models
1. related: in mixture of experts models + combination are trained jointly
3. dynamic weighting: combine using local competence of base classifiers
4. Oracle baseline - selects classifier predicts correct label, if such a classifier exists

•  label shift estimation (BBSE) - $p(y)$ shifts but $P(x y)$ does not