4.3. testing#

wonderful summarizing blog post

4.3.1. basics#

  • data snooping - decide which hypotheses to test after examining data

  • null hypothesis \(H_0\) vs alternative hypothesis \(H_1\)

  • types

    • simple hypothesis \(\theta = \theta_0\)

    • composite hypothesis \(\theta > \theta_0\) or \(\theta < \theta_0\)

    • two-sided test: \(H_0: \theta = \theta_0 \: vs. \: H_1 \theta \neq \theta_0\)

    • one-sided test: \(H_0: \theta \leq \theta_0 \: vs. \: H_1: \theta > \theta_0\)

  • significance levels

    • stat. significant: p = 0.05

    • highly stat. significant: p = 0.01

  • errors

    • \(\alpha\) - type 1 - reject \(H_0\) but \(H_0\) true

    • \(\beta\) - type 2 - fail to reject \(H_0\) but \(H_0\) false

  • p-value = probability, calculated assuming that the null hypothesis is true, of obtaining a value of the test statistic at least as contradictory to \(H_0\) as the value calculated from the available sample

  • power: \(1 - \beta\)

  • adjustments

    • bonferroni procedure - we are doing 3 tests with 5% confidence, so we actually do 5/3% for each test in order to restrict everything to 5% total

    • Benjamini–Hochberg procedure - controls for false discovery rate

  • note: ranking is often more important than actual FDR control (because we just need to know what experiments to do)

4.3.2. gaussian theory#

  • normal theory: assume \(\epsilon_i\) ~ \(N(0, \sigma^2)\)

  • distributions

    • suppose \(Z_1, ..., Z_n\) ~ iid N(0, 1)

    • chi-squared: \(\chi_d^2\) ~ \(\sum_i^d U_i^2\) w/ d degrees of freedom

      • \((d-1)S^2/\sigma^2 \text{ proportional to } \chi_{d-1}^2\)

    • student’s t: \(U_{d+1} / \sqrt{d^{-1} \sum_1^d U_i^2}\) w/ d degress of freedom

  • t-test: test if mean is nonzero

    • test null \(\theta_k=0\) w/ \(t = \hat{\theta}_k / \hat{SE}\) where \(SE = \hat{\sigma} \cdot \sqrt{\Sigma_{kk}^{-1}}\)

    • t-test: reject if |t| is large

    • when n-p is large, t-test is called the z-test

    • under null hypothesis t follows t-distr with n-p degrees of freedom

    • here, \(\hat{\theta}\) has a normal distr. with mean \(\theta\) and cov matrix \(\sigma^2 (X^TX)^{-1}\)

      • e independent of \(\hat{\theta}\) and \(\|\|e\|\|^2 ~ \sigma^2 \chi^2_d\) with d = n-p

    • observed stat. significance level = P-value - area of normal curve beyond \(\pm \hat{\theta_k} / \hat{SE}\)

    • if 2 vars are statistically significant, said to have independent effects on Y

  • f-test: test if any of non-zero means

    • null hypothesis: \(\theta_i = 0, i=p-p_0, ..., p\)

    • alternative hypothesis: for at least one \( i \in \{p-p_0, ..., p\}, \: \theta_i \neq 0\)

    • \(F = \frac{(\|\|X\hat{\theta}\|\|^2 - \|\|X\hat{\theta}^{(s)}\|\|^2) / p_0}{\|\|e\|\|^2 / (n-p)} \) where \(\hat{\theta^{(s)}}\) has last \(p_0\) entries 0

    • under null hypothesis, \(\|\|X\hat{\theta}\|\|^2 - \|\|X\hat{\theta}^{(s)}\|\|^2\) ~ \(U\), \(\|\|e\|\|^2\) ~ \(V\), \(F\) ~ \(\frac{U/p_0}{V/(n-p)}\) where \( U \: indep \: V\), \(U\) ~ \(\sigma^2 \chi^2_{p_0}\), \(V\) ~ \(\sigma^2 \chi_{n-p}^2\)

    • there is also a partial f-test

4.3.3. statistical intervals#

  • interval estimates come with confidence levels

  • \(Z=\frac{\bar{X}-\mu}{\sigma / \sqrt{n}}\)

  • For p not close to 0.5, use Wilson score confidence interval (has extra terms)

  • confidence interval - if multiple samples of trained typists were selected and an interval constructed for each sample mean, 95 percent of these intervals contain the true preferred keyboard height

    • frequentist idea

4.3.4. tests on hypotheses#

  • Var(\(\bar{X}-\bar{Y})=\frac{\sigma_1^2}{m}+\frac{\sigma_2^2}{n}\)

  • tail refers to the side we reject (e.g. upper-tailed=\(H_a:\theta>\theta_0\)

  • we try to make the null hypothesis a statement of equality

  • upper-tailed - reject large values

  • \(\alpha\) is computed using the probability distribution of the test statistic when \(H_0\) is true, whereas determination of b requires knowing the test statistic distribution when \(H_0\) is false

  • type 1 error usually more serious, pick \(\alpha\) level, then constrain \(\beta\)

  • can standardize values and test these instead

4.3.5. testing LR coefficients#

  • confidence interval construction

    • confidence interval (CI) is range of values likely to include true value of a parameter of interest

    • confidence level (CL) - probability that the procedure used to determine CI will provide an interval that covers the value of the parameter - if we remade it 100 times, 95 would contain the true \(\theta_1\)

  • \(\hat{\beta_0} \pm t_{n-2,\alpha /2} * s.e.(\hat{\beta_0}) \)

    • for \(\beta_1\)

      • with known \(\sigma\)

        • \(\frac{\hat{\beta_1}-\beta_1}{\sigma(\hat{\beta_1})} \sim N(0,1)\)

        • derive CI

      • with unknown \(\sigma\)

        • \(\frac{\hat{\beta_1}-\beta_1}{s(\hat{\beta_1})} \sim t_{n-2}\)

        • derive CI

4.3.6. ANOVA (analysis of variance)#

  • y - called dependent, response variable

  • x - independent, explanatory, predictor variable

  • notation: \(E(Y\|x^*) = \mu_{Y\cdot x^*} = \) mean value of Y when x = \(x^*\)

  • Y = f(x) + \(\epsilon\)

  • linear: \(Y=\beta_0+\beta_1 x+\epsilon\)

  • logistic: \(odds = \frac{p(x)}{1-p(x)}=e^{\beta_0+\beta_1 x+\epsilon}\)

  • we minimize least squares: \(SSE = \sum_{i=1}^n (y_i-(b_0+b_1x_i))^2\)

  • \(b_1=\hat{\beta_1}=\frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2} = \frac{S_{xy}}{S_{xx}}\)

  • \(b_0=\bar{y}-\hat{\beta_1}\bar{x}\)

  • \(S_{xy}=\sum x_iy_i-\frac{(\sum x_i)(\sum y_i)}{n}\)

  • \(S_{xx}=\sum x_i^2 - \frac{(\sum x_i)^2}{n}\)

  • residuals: \(y_i-\hat{y_i}\)

  • SSE = \(\sum y_i^2 - \hat{\beta}_0 \sum y_i - \hat{\beta}_1 \sum x_iy_i\)

  • SST = total sum of squares = \(S_{yy} = \sum (y_i-\bar{y})^2 = \sum y_i^2 - (\sum y_i)^2/n\)

  • \(r^2 = 1-\frac{SSE}{SST}=\frac{SSR}{SST}\) - proportion of observed variation that can be explained by regression

  • \(\hat{\sigma}^2 = \frac{SSE}{n-2}\)

  • \(T=\frac{\hat{\beta}_1-\beta_1}{S / \sqrt{S_{xx}}}\) has a t distr. with n-2 df

  • \(s_{\hat{\beta_1}}=\frac{s}{\sqrt{S_{xx}}}\)

  • \(s_{\hat{\beta_0}+\hat{\beta_1}x^*} = s\sqrt{\frac{1}{n}+\frac{(x^*-\bar{x})^2}{S_{xx}}}\)

  • sample correlation coefficient \(r = \frac{S_{xy}}{\sqrt{S_xx}\sqrt{S_{yy}}}\)

  • this is a point estimate for population correlation coefficient = \(\frac{Cov(X,Y)}{\sigma_X\sigma_Y}\)

  • make fisher transformation - this test statistic also tests correlation

  • degrees of freedom

  • one-sample T = n-1

  • T procedures with paired data - n-1

  • T procedures for 2 independent populations - use formula ~= smaller of n1-1 and n2-1

  • variance - n-2

  • use z-test if you know the standard deviation—