Section | Topic |
---|---|
general | intro, linear algebra, gaussian, parameter estimation, bias-variance |
regression | lin reg, LS, kernels, sparsity |
dim reduction | dim reduction |
classification | discr. vs. generative, nearest neighbor, DNNs, log. regression, lda/qda, decision trees, svms |
optimization | problems, algorithms, duality, boosting, em |
equivalent to the triangle inequality
nxp matrix:
function f:
function f:
=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂2f∂x21∂2f∂x2∂x1⋮∂2f∂xn∂x1∂2f∂x1∂x2∂2f∂x22⋮∂2f∂xn∂x2⋯⋯⋱⋯∂2f∂x1∂xn∂2f∂x2∂xn⋮∂2f∂x2n⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
xTAx=tr(xxTA)=∑i,jxiAi,jxj
n = number of data points
d = dimension of each data point
Model | Loss |
---|---|
OLS |
|
Ridge |
|
Lasso |
|
Elastic Net |
|
θ̂ Bayes=Eθp(θ|x)
add i.i.d. gaussian noise in x and y - regularization
orthogonal dimensions that maximize variance of
X -= np.mean(X, axis=0) #zero-center data (nxd)
cov = np.dot(X.T, X) / X.shape[0] #get cov. matrix (dxd)
U, D, V = np.linalg.svd(cov) #compute svd, (all dxd)
X_2d = np.dot(X, U[:, :2]) #project in 2d (nx2)
invariant to scalings / affine transformations of X, Y
yˆh(x)=∑ni=1Kh(x−xi)yi∑nj=1Kh(x−xj)
Hessian
M-smooth = Lipschitz continuous gradient:
Lipschitz continuous f | M-smooth |
---|---|
also see nn demo playground
from numpy import exp, array, random
X = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
Y = array([[0, 1, 1, 0]]).T
w = 2 * random.random((3, 1)) - 1
for iteration in range(10000):
Yhat = 1 / (1 + exp(-(X @ w)))
w += X.T @ (Y - Yhat) * Yhat * (1 - Yhat)
print(1 / (1 + exp(-(array([1, 0, 0] @ w))))
import tensorflow as tf
import torch
want to maximize complete log-likelihood
note 20 is good
decision boundary: {
can rewrite by absorbing
Model |
|
---|---|
Perceptron |
|
Linear SVM |
|
Logistic regression |
|
p∗=minf0(x)s.t.fi(x)≤0hi(x)=0
d∗=maxλ,νinfxf0(x)+∑λifi(x)+∑νihi(x)LagrangianL(x,λ,ν)dual functiong(λ,ν)s.t.λ⪰0
dual function
weak duality:
strong duality:
minw||Xw−y||22s.t.||w||1≤k
minw||Xw−y||22s.t.||w||2≤k
lasso:
ridge:
maximize H(parent) - [weighted average]
sequentially train many weak learners to approximate a function