Chandan Singh | transfer learning

transfer learning

Domain test bed available here, for generalizating to new domains (i.e. performing well on domains that differ from previous seen data)

• Empirical Risk Minimization (ERM, Vapnik, 1998) - standard training
• Invariant Risk Minimization (IRM, Arjovsky et al., 2019) - learns a feature representation such that the optimal linear classifier on top of that representation matches across domains.
• Group Distributionally Robust Optimization (GroupDRO, Sagawa et al., 2020) - ERM + increase importance of domains with larger errors (see also papers from Sugiyama group e.g. 1, 2)
• Variance Risk Extrapolation (VREx, Krueger et al., 2020) - encourages robustness over affine combinations of training risks, by encouraging strict equality between training risks
• Interdomain Mixup (Mixup, Yan et al., 2020) - ERM on linear interpolations of examples from random pairs of domains + their labels
• Marginal Transfer Learning (MTL, Blanchard et al., 2011-2020) - augment original feature space with feature vector marginal distributions and then treat as a supervised learning problem
• Meta Learning Domain Generalization (MLDG, Li et al., 2017) - use MAML to meta-learn how to generalize across domains
• learning more diverse predictors
• Representation Self-Challenging (RSC, Huang et al., 2020) - adds dropout-like regularization to important features, forcing model to depend on many features
• Spectral Decoupling (SD, Pezeshki et al., 2020) - regularization which forces model to learn more predictive features, even when only a few suffice
• embedding prior knowledge
• Style Agnostic Networks (SagNet, Nam et al., 2020) - penalize style features (assumed to be spurious)
• Penalizing explanations (Rieger et al. 2020) - penalize spurious features using prior knowledge
• Domain adaptation under structural causal models (chen & buhlmann, 2020)
• make clearer assumptions for domain adaptation to work
• introduce CIRM, which works better when both covariates and labels are perturbed in target data
• kernel approach (blanchard, lee & scott, 2011) - find an appropriate RKHS and optimize a regularized empirical risk over the space

## domain invariance

key idea: want repr. to be invariant to domain label

•  label shift estimation (BBSE) - $p(y)$ shifts but $P(x y)$ does not