Section | Topic |
general | intro, linear algebra, gaussian, parameter estimation, bias-variance |
regression | lin reg, LS, kernels, sparsity |
dim reduction | dim reduction |
classification | discr. vs. generative, nearest neighbor, DNNs, log. regression, lda/qda, decision trees, svms |
optimization | problems, algorithms, duality, boosting, em |
equivalent to the triangle inequality
nxp matrix:
function f:
function f:
n = number of data points
d = dimension of each data point
Model | Loss |
Ridge |
Lasso |
Elastic Net |
θ̂ Bayes=Eθp(θ|x)
add i.i.d. gaussian noise in x and y - regularization
orthogonal dimensions that maximize variance of
X -= np.mean(X, axis=0) #zero-center data (nxd)
cov =, X) / X.shape[0] #get cov. matrix (dxd)
U, D, V = np.linalg.svd(cov) #compute svd, (all dxd)
X_2d =, U[:, :2]) #project in 2d (nx2)
invariant to scalings / affine transformations of X, Y
M-smooth = Lipschitz continuous gradient:
Lipschitz continuous f | M-smooth |
also see nn demo playground
from numpy import exp, array, random
X = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
Y = array([[0, 1, 1, 0]]).T
w = 2 * random.random((3, 1)) - 1
for iteration in range(10000):
Yhat = 1 / (1 + exp(-(X @ w)))
w += X.T @ (Y - Yhat) * Yhat * (1 - Yhat)
print(1 / (1 + exp(-(array([1, 0, 0] @ w))))
import tensorflow as tf
import torch
want to maximize complete log-likelihood
note 20 is good
decision boundary: {
can rewrite by absorbing
Model |
Perceptron |
Linear SVM |
Logistic regression |
d∗=maxλ,νinfxf0(x)+∑λifi(x)+∑νihi(x)LagrangianL(x,λ,ν)dual functiong(λ,ν)s.t.λ⪰0
dual function
weak duality:
strong duality:
maximize H(parent) - [weighted average]
sequentially train many weak learners to approximate a function