3.4. linear algebra#
3.4.1. linear basics#
3.4.1.1. notation#
- \(x \preceq y\) - these are vectors and x is less than y elementwise 
- \(X \preceq Y\) - matrices, \(Y-X\) is PSD - \(v^TXv \leq v^TYv \:\: \forall v\) 
 
3.4.1.2. linearity#
- inner product \(<X, Y> = tr(X^TY) = \sum_i \sum_j X_{ij} Y_{ij}\) - like inner product if we collapsed into big vector 
- linear 
- symmetric 
- gives angle back 
 
- linear - superposition \(f(x+y) = f(x)+f(y) \) 
- proportionality \(f(k\cdot x) = k \cdot f(x)\) 
 
- bilinear just means a function is linear in 2 variables 
- vector space - closed under addition 
- contains identity 
 
- det - sum of products including one element from each row / column with correct sign - absolute value = area of parallelogram made by rows (or cols) 
 
 
- lin independent: \(c_1x_1+c_2x_2=0 \implies c_1=c_2=0\) 
- cauchy-schwartz inequality: \(|x^T y| \leq ||x||_2 ||y|||_2\) - implies triangle inequality: \(||x+y||^2 \leq (||x|| + ||y||)^2\) 
 
3.4.1.3. matrix properties#
- \(x^TAx = tr(xx^TA)\) 
- nonsingular = invertible = nonzero determinant = null space of zero - only square matrices 
- rank of mxn matrix- max number of linearly independent columns / rows - rank==m==n, then nonsingular 
 
- ill-conditioned matrix - matrix is close to being singular - very small determinant 
 
- inverse - orthogonal matrix: all columns are orthonormal - \(A^{-1} = A^T\) 
- preserves the Euclidean norm \(||Ax||_2 = ||x||_2\) 
 
- if diagonal, inverse is invert all elements 
- inverting 3x3 - transpose, find all mini dets, multiply by signs, divide by det 
- psuedo-inverse = Moore-Penrose inverse \(A^\dagger = (A^T A)^{-1} A^T\) - if A is nonsingular, \(A^\dagger = A^{-1}\) 
- if rank(A) = m, then must invert using \(A A^T\) 
- if rank(A) = n, then must use \(A^T A\) 
 
- inversion of matrix is \(\approx O(n^3)\) 
- inverse of psd symmetric matrix is also psd and symmetric 
- if A, B invertible \((AB)^{-1} = B^{-1} A^{-1}\) 
 
- orthogonal complement - set of orthogonal vectors - define R(A) to be range space of A (column space) and N(A) to be null space of A 
- R(A) and N(A) are orthogonal complements 
- dim \(R(A)\) = r 
- dim \(N(A)\) = n-r 
- dim \(R(A^T)\) = r 
- dim \(N(A^T)\) = m-r 
 
- adjoint - compute with mini-dets - \(A^{-1} = adj(A) / \det(A)\) 
 
- Schur complement of \(X = \begin{bmatrix} A & B \\ C & D\end{bmatrix}\) - \(M/D = A - BD^{-1}C\) 
- \(M/A = D-CA^{-1}B\) 
- \(X \succeq 0 \iff M/D \succeq 0\) 
 
3.4.2. matrix calc#
- overview: imagine derivative \(f(x + \Delta)\) 
- function f: \(\text{anything} \to \mathbb{R}^m\) - gradient vector \(\nabla_A f(\mathbf{A})\)- partial derivatives with respect to each element of A (vector or matrix) 
- gradient = \(\frac{\partial f}{\partial A}^T\) 
 
- these next 2 assume numerator layout (numerator-major order, so numerator constant along rows) 
- function f: \(\mathbb{R}^n \to \mathbb{R}^m\) - Jacobian matrix: $\(\mathbf J = \begin{bmatrix} \dfrac{\partial \mathbf{f}}{\partial x_1} & \cdots & \dfrac{\partial \mathbf{f}}{\partial x_n} \end{bmatrix}= \begin{bmatrix} \dfrac{\partial f_1}{\partial x_1} & \cdots & \dfrac{\partial f_1}{\partial x_n}\\ \vdots & \ddots & \vdots\\ \dfrac{\partial f_m}{\partial x_1} & \cdots & \dfrac{\partial f_m}{\partial x_n} \end{bmatrix}\)$ - this is dim(f) x dim(x) 
 
- function f: \(\mathbb{R}^n \to \mathbb{R}\) - 2nd derivative is Hessian matrix - \(\bold H = \nabla^2 f(x)_{ij} = \frac{\partial^2 f(x)}{\partial x_i \partial x_j} = \begin{bmatrix} \dfrac{\partial^2 f}{\partial x_1^2} & \dfrac{\partial^2 f}{\partial x_1\,\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_1\,\partial x_n} \\[2.2ex] \dfrac{\partial^2 f}{\partial x_2\,\partial x_1} & \dfrac{\partial^2 f}{\partial x_2^2} & \cdots & \dfrac{\partial^2 f}{\partial x_2\,\partial x_n} \\[2.2ex] \vdots & \vdots & \ddots & \vdots \\[2.2ex] \dfrac{\partial^2 f}{\partial x_n\,\partial x_1} & \dfrac{\partial^2 f}{\partial x_n\,\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_n^2}\end{bmatrix}\) 
 
 
- examples - \(\nabla_x a^T x = a\) 
- \(\nabla_x x^TAx = 2Ax\) (if A symmetric, else \((A+A^T)x)\)) 
- \(\nabla_x^2 x^TAx = 2A\) (if A symmetric, else \(A+A^T\)) 
- \(\nabla_x \log \: \det X = X^{-1}\) 
 
- we can calculate derivs of quadratic forms by calculating derivs of traces - \(x^TAx = tr[x^TAx] = tr[xx^TA]\) 
- \(\implies \frac{\partial}{\partial A} x^TAx = \frac{\partial}{\partial A} tr[xx^TA] = [xx^T]^T = xx^T\) 
- useful result: \(\frac{\partial}{\partial A} log|A| = A^{-T}\) 
 
3.4.3. norms#
- def - nonnegative 
- definite f(x) = 0 iff x = 0 
- proportionality (also called homogenous) 
- triangle inequality 
 
- properties - convex 
 
3.4.3.1. vector norms#
- \(L_p-\)norms: \(||x||_p = (\sum_{i=1}^n |x_i|^p)^{1/p}\) - \(L_0\) norm - number of nonzero elements (this is not actually a norm!) 
- \(||x||_1 = \sum |x_i|\) 
- \(||x||_2\) - Euclidean norm 
- \(||x||_\infty = \max_i |x_i|\) - also called Cheybyshev norm 
 
- quadratic norms - P-quadratic norm: \(||x||_P = (x^TPx)^{1/2} = || P^{1/2} x ||_2\) where \(P \in S_{++}^n\) 
 
- dual norm - given a norm \(|| \cdot ||\), dual norm \(||z||_* = sup\{ z^Tx \: | \: ||x|| \leq 1\}\) 
- dual of the dual is the original 
- dual of Euclidean is just Euclidean 
- dual of \(l_1\) is \(l_\infty\) 
- dual of spectral norm is some of the singular values 
 
3.4.3.2. matrix norms#
- schatten p-norms: \(||X||_p = (\sum \sigma^p_i(A) )^{1/p}\) - note this is nice for organization but this p is never really mentioned - p=1: nuclear norm = trace norm: \(||X||_* = \sum_i \sigma_i\) 
- p=2: frobenius norm = euclidean norm: \(||X||_F^2 = \sqrt {\sum_{ij} X_{ij}^2} = \sqrt{\sum_i \sigma_i^2}\) - like vector \(L_2\) norm 
- p=\(\infty\): spectral norm = \(\mathbf{L_2}\)-norm (of a matrix) = \(||X||_2 = \sigma_\text{max}(X) \) 
 
- entrywise norms - sum-absolute-value norm (like vector \(l_1\)) 
- maximum-absolute-value norm (like vector \(l_\infty\)) 
 
- operator norm - let \(||\cdot||_a\) and \(|| \cdot ||_b\) be vector norms 
- operator norm \(||X||_{a,b} = sup\{ ||Xu||_a \: | \: ||u||_b \leq 1 \}\) - represents the maximum stretching that X does to a vector u 
 
- if using p-norms, can get Frobenius and some others 
 
3.4.4. eigenstuff#
3.4.4.1. eigenvalues intro - strang 5.1#
- elimination changes eigenvalues 
- eigenvector application to diff eqs \(\frac{du}{dt}=Au\) - soln is exponential: \(u(t) = c_1 e^{\lambda_1 t} x_1 + c_2 e^{\lambda_2 t} x_2\) 
 
- eigenvalue eqn: \(Ax = \lambda x \implies (A-\lambda I)x=0\) - \(det(A-\lambda I) = 0\) yields characteristic polynomial 
 
- eigenvalue properties - 0 eigenvalue \(\implies\) A is singular 
- eigenvalues are on the main diagonal when the matrix is triangular 
 
- expressions when \(A \in \mathbb{S}\) - \(\det(A) = \prod_i \lambda_i\) 
- \(tr(A) = \sum_i \lambda_i\) 
- \(||A||_2 = \max | \lambda_i |\) 
- \(||A||_F = \sqrt{\sum \lambda_i^2}\) 
- \(\lambda_{max} (A) = \sup_{x \neq 0} \frac{x^T A x}{x^T x}\) 
- \(\lambda_{min} (A) = \inf_{x \neq 0} \frac{x^T A x}{x^T x}\) 
 
- defective matrices - lack a full set of eigenvalues 
- positive semi-definite: \(A \in R^{nxn}\) - basically these are always symmetric \(A=A^T\) 
- all eigenvalues are nonnegative 
- if \(\forall x \in R^n, x^TAx \geq 0\) then A is positive semi definite (PSD) - like it curves up 
- Note: \(x^TAx = \sum_{i, j} x_iA_{i, j} x_j\) 
 
- if \(\forall x \in R^n, x^TAx > 0\) then A is positive definite (PD) - PD \(\to\) full rank, invertible 
 
- PSD + symmetric \(\implies\) can be written as Gram matrix \(G = X^T X \) - if X full rank, then \(G\) is PD 
 
- PSD notation - \(S^n\) - set of symmetric matrices 
- \(S^n_+\) - set of PSD matrices 
- \(S^n_{++}\) - set of PD matrices 
 
 
3.4.4.2. strang 5.2 - diagonalization#
- diagonalization = eigenvalue decomposition = spectral decomposition 
- assume A (nxn) is symmetric - \(A = Q \Lambda Q^T\) 
- Q := eigenvectors as columns, Q is orthonormal 
 
- only diagonalizable if n independent eigenvectors - not related to invertibility 
- eigenvectors corresponding to different eigenvalues are lin. independent 
- other Q matrices won’t produce diagonal 
- there are always n complex eigenvalues 
- orthogonal matrix \(Q^TQ=I\) 
 
- examples - if X, Y symmetric, \(tr(YX) = tr(Y \sum \lambda_i q_i q_i^T)\) 
- lets us easily calculate \(A^2\), \(sqrt(A)\) 
- eigenvalues of \(A^2\) are squared, eigenvectors remain same 
- eigenvalues of \(A^{-1}\) are inverse eigenvalues 
- eigenvalue of rotation matrix is \(i\) 
 
- eigenvalues for \(AB\) only multiply when A and B share eigenvectors - diagonalizable matrices share the same eigenvector matrix S iff \(AB = BA\) 
 
- generalized eigenvalue decomposition - for 2 symmetric matrices - \(A = V \Lambda V^T\), \(B=VV^T\) 
 
3.4.4.3. strang 6.3 - singular value decomposition#
- SVD for any nxp matrix: \(X=U \Sigma V^T\) - U columns (nxn) are eigenvectors of \(XX^T\) 
- columns of V (pxp) are eigenvectors of \(X^TX\) 
- r singular values on diagonal of \(\Sigma\) (nxp) - square roots of nonzero eigenvalues of both \(XX^T\) and \(X^TX\) 
- like rotating, scaling, and rotating back 
- SVD ex. \(A=UDV^T \implies A^{-1} = VD^{-1} U^T\) 
- \(X = \sum_i \sigma_i u_i v_i^T\) 
 
- properties - for PD matrices, \(\Sigma=\Lambda\), \(U\Sigma V^T = Q \Lambda Q^T\) 
 - for other symmetric matrices, any negative eigenvalues in \(\Lambda\) become positive in \(\Sigma\) 
 
- applications - very numerically stable because U and V are orthogonal matrices 
- condition number of invertible nxn matrix = \(\sigma_{max} / \sigma_{min}\) 
- \(A=U\Sigma V^T = u_1 \sigma_1 v_1^T + ... + u_r \sigma_r v_r^T\) - we can throw away columns corresponding to small \(\sigma_i\) 
 
- pseudoinverse \(A^+ = V \Sigma^+ U^T\) 
 
3.4.4.4. strang 5.3 - difference eqs and power \(A^k\)#
- compound interest 
- solving for fibonacci numbers 
- Markov matrices - steady-state Ax = x 
- corresponds to \(\lambda = 1\) 
 
- stability of \(u_{k+1} = A u_k\) - stable if all eigenvalues satisfy \(|\lambda_i|\) <1 
- neutrally stable if some \(|\lambda_i|=1\) 
- unstable if at least one \(|\lambda_i|\) > 1 
 
- Leontief’s input-output matrix 
- Perron-Frobenius thm - if A is a positive matrix (positive values), so is its largest eigenvalue and every component of the corresponding eigenvector is also positive - useful for ranking, etc. 
 
- power method: want to find eigenvector \(v\) corresponding to largest eigenvalue - \(v = \underset{n \to \infty}{\lim} \frac{A^n v_0}{|A^nv_0|}\) where \(v_0\) is nonnegative 
 
