5.8. feature selection

5.8.1. filtering - select based on summary statistic

  • ranks features or feature subsets independently of the predictor

  • univariate methods (consider one variable at a time)

    • ex. variance threshold

    • ex. T-test of y for each variable

    • ex. correlation screening: pearson correlation coefficient - this can only capture linear dependencies

    • mutual information - covers all dependencies

    • ex. chi\(^2\), f anova

  • multivariate methods

    • features subset selection

    • need a scoring function

    • need a strategy to search the space

    • sometimes used as preprocessing for other methods

5.8.2. wrapper - recursively eliminate features

  • uses a predictor to assess features of feature subsets

  • learner is considered a black-box - use train, validate, test set

  • forward selection - start with nothing and keep adding

  • backward elimination - start with all and keep removing

  • others: Beam search - keep k best path at teach step, GSFS, PTA(l,r), floating search - SFS then SBS

5.8.3. embedding - select from a model

  • uses a predictor to build a model with a subset of features that are internally selected

  • ex. lasso, ridge regression, random forest