Section | Topic |
---|---|
general | intro, linear algebra, gaussian, parameter estimation, bias-variance |
regression | lin reg, LS, kernels, sparsity |
dim reduction | dim reduction |
classification | discr. vs. generative, nearest neighbor, DNNs, log. regression, lda/qda, decision trees, svms |
optimization | problems, algorithms, duality, boosting, em |
equivalent to the triangle inequality
nxp matrix:
function f:
function f:
=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂2f∂x21∂2f∂x2∂x1⋮∂2f∂xn∂x1∂2f∂x1∂x2∂2f∂x22⋮∂2f∂xn∂x2⋯⋯⋱⋯∂2f∂x1∂xn∂2f∂x2∂xn⋮∂2f∂x2n⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
xTAx=tr(xxTA)=∑i,jxiAi,jxj
n = number of data points
d = dimension of each data point
Model | Loss |
---|---|
OLS |
|
Ridge |
|
Lasso |
|
Elastic Net |
|
θ̂ Bayes=Eθp(θ|x)
add i.i.d. gaussian noise in x and y - regularization
orthogonal dimensions that maximize variance of
X -= np.mean(X, axis=0) #zero-center data (nxd)
cov = np.dot(X.T, X) / X.shape[0] #get cov. matrix (dxd)
U, D, V = np.linalg.svd(cov) #compute svd, (all dxd)
X_2d = np.dot(X, U[:, :2]) #project in 2d (nx2)
invariant to scalings / affine transformations of X, Y
yˆh(x)=∑ni=1Kh(x−xi)yi∑nj=1Kh(x−xj)
Hessian
M-smooth = Lipschitz continuous gradient: