# 3.4. linear algebra¶

## 3.4.1. linear basics¶

### 3.4.1.1. notation¶

• $$x \preceq y$$ - these are vectors and x is less than y elementwise

• $$X \preceq Y$$ - matrices, $$Y-X$$ is PSD

• $$v^TXv \leq v^TYv \:\: \forall v$$

### 3.4.1.2. linearity¶

• inner product $$<X, Y> = tr(X^TY) = \sum_i \sum_j X_{ij} Y_{ij}$$

• like inner product if we collapsed into big vector

• linear

• symmetric

• gives angle back

• linear

1. superposition $$f(x+y) = f(x)+f(y)$$

2. proportionality $$f(k\cdot x) = k \cdot f(x)$$

• bilinear just means a function is linear in 2 variables

• vector space

2. contains identity

• det - sum of products including one element from each row / column with correct sign

• absolute value = area of parallelogram made by rows (or cols)

• lin independent: $$c_1x_1+c_2x_2=0 \implies c_1=c_2=0$$

• cauchy-schwartz inequality: $$|x^T y| \leq ||x||_2 ||y|||_2$$

• implies triangle inequality: $$||x+y||^2 \leq (||x|| + ||y||)^2$$

### 3.4.1.3. matrix properties¶

• $$x^TAx = tr(xx^TA)$$

• nonsingular = invertible = nonzero determinant = null space of zero

• only square matrices

• rank of mxn matrix- max number of linearly independent columns / rows

• rank==m==n, then nonsingular

• ill-conditioned matrix - matrix is close to being singular - very small determinant

• inverse

• orthogonal matrix: all columns are orthonormal

• $$A^{-1} = A^T$$

• preserves the Euclidean norm $$||Ax||_2 = ||x||_2$$

• if diagonal, inverse is invert all elements

• inverting 3x3 - transpose, find all mini dets, multiply by signs, divide by det

• psuedo-inverse = Moore-Penrose inverse $$A^\dagger = (A^T A)^{-1} A^T$$

• if A is nonsingular, $$A^\dagger = A^{-1}$$

• if rank(A) = m, then must invert using $$A A^T$$

• if rank(A) = n, then must use $$A^T A$$

• inversion of matrix is $$\approx O(n^3)$$

• inverse of psd symmetric matrix is also psd and symmetric

• if A, B invertible $$(AB)^{-1} = B^{-1} A^{-1}$$

• orthogonal complement - set of orthogonal vectors

• define R(A) to be range space of A (column space) and N(A) to be null space of A

• R(A) and N(A) are orthogonal complements

• dim $$R(A)$$ = r

• dim $$N(A)$$ = n-r

• dim $$R(A^T)$$ = r

• dim $$N(A^T)$$ = m-r

• adjoint - compute with mini-dets

• $$A^{-1} = adj(A) / \det(A)$$

• Schur complement of $$X = \begin{bmatrix} A & B \\ C & D\end{bmatrix}$$

• $$M/D = A - BD^{-1}C$$

• $$M/A = D-CA^{-1}B$$

• $$X \succeq 0 \iff M/D \succeq 0$$

## 3.4.2. matrix calc¶

• overview: imagine derivative $$f(x + \Delta)$$

• function f: $$\text{anything} \to \mathbb{R}^m$$

• gradient vector $$\nabla_A f(\mathbf{A})$$- partial derivatives with respect to each element of A (vector or matrix)

• gradient = $$\frac{\partial f}{\partial A}^T$$

• these next 2 assume numerator layout (numerator-major order, so numerator constant along rows)

• function f: $$\mathbb{R}^n \to \mathbb{R}^m$$

• Jacobian matrix: $$$\mathbf J = \begin{bmatrix} \dfrac{\partial \mathbf{f}}{\partial x_1} & \cdots & \dfrac{\partial \mathbf{f}}{\partial x_n} \end{bmatrix}= \begin{bmatrix} \dfrac{\partial f_1}{\partial x_1} & \cdots & \dfrac{\partial f_1}{\partial x_n}\\ \vdots & \ddots & \vdots\\ \dfrac{\partial f_m}{\partial x_1} & \cdots & \dfrac{\partial f_m}{\partial x_n} \end{bmatrix}$$$ - this is dim(f) x dim(x)

• function f: $$\mathbb{R}^n \to \mathbb{R}$$

• 2nd derivative is Hessian matrix

• $$\bold H = \nabla^2 f(x)_{ij} = \frac{\partial^2 f(x)}{\partial x_i \partial x_j} = \begin{bmatrix} \dfrac{\partial^2 f}{\partial x_1^2} & \dfrac{\partial^2 f}{\partial x_1\,\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_1\,\partial x_n} \\[2.2ex] \dfrac{\partial^2 f}{\partial x_2\,\partial x_1} & \dfrac{\partial^2 f}{\partial x_2^2} & \cdots & \dfrac{\partial^2 f}{\partial x_2\,\partial x_n} \\[2.2ex] \vdots & \vdots & \ddots & \vdots \\[2.2ex] \dfrac{\partial^2 f}{\partial x_n\,\partial x_1} & \dfrac{\partial^2 f}{\partial x_n\,\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_n^2}\end{bmatrix}$$

• examples

• $$\nabla_x a^T x = a$$

• $$\nabla_x x^TAx = 2Ax$$ (if A symmetric, else $$(A+A^T)x)$$)

• $$\nabla_x^2 x^TAx = 2A$$ (if A symmetric, else $$A+A^T$$)

• $$\nabla_x \log \: \det X = X^{-1}$$

• we can calculate derivs of quadratic forms by calculating derivs of traces

• $$x^TAx = tr[x^TAx] = tr[xx^TA]$$

• $$\implies \frac{\partial}{\partial A} x^TAx = \frac{\partial}{\partial A} tr[xx^TA] = [xx^T]^T = xx^T$$

• useful result: $$\frac{\partial}{\partial A} log|A| = A^{-T}$$

## 3.4.3. norms¶

• def

1. nonnegative

2. definite f(x) = 0 iff x = 0

3. proportionality (also called homogenous)

4. triangle inequality

• properties

• convex

### 3.4.3.1. vector norms¶

• $$L_p-$$norms: $$||x||_p = (\sum_{i=1}^n |x_i|^p)^{1/p}$$

• $$L_0$$ norm - number of nonzero elements (this is not actually a norm!)

• $$||x||_1 = \sum |x_i|$$

• $$||x||_2$$ - Euclidean norm

• $$||x||_\infty = \max_i |x_i|$$ - also called Cheybyshev norm

• P-quadratic norm: $$||x||_P = (x^TPx)^{1/2} = || P^{1/2} x ||_2$$ where $$P \in S_{++}^n$$

• dual norm

• given a norm $$|| \cdot ||$$, dual norm $$||z||_* = sup\{ z^Tx \: | \: ||x|| \leq 1\}$$

• dual of the dual is the original

• dual of Euclidean is just Euclidean

• dual of $$l_1$$ is $$l_\infty$$

• dual of spectral norm is some of the singular values

### 3.4.3.2. matrix norms¶

• schatten p-norms: $$||X||_p = (\sum \sigma^p_i(A) )^{1/p}$$ - note this is nice for organization but this p is never really mentioned

• p=1: nuclear norm = trace norm: $$||X||_* = \sum_i \sigma_i$$

• p=2: frobenius norm = euclidean norm: $$||X||_F^2 = \sqrt {\sum_{ij} X_{ij}^2} = \sqrt{\sum_i \sigma_i^2}$$ - like vector $$L_2$$ norm

• p=$$\infty$$: spectral norm = $$\mathbf{L_2}$$-norm (of a matrix) = $$||X||_2 = \sigma_\text{max}(X)$$

• entrywise norms

• sum-absolute-value norm (like vector $$l_1$$)

• maximum-absolute-value norm (like vector $$l_\infty$$)

• operator norm

• let $$||\cdot||_a$$ and $$|| \cdot ||_b$$ be vector norms

• operator norm $$||X||_{a,b} = sup\{ ||Xu||_a \: | \: ||u||_b \leq 1 \}$$

• represents the maximum stretching that X does to a vector u

• if using p-norms, can get Frobenius and some others

## 3.4.4. eigenstuff¶

### 3.4.4.1. eigenvalues intro - strang 5.1¶

• nice viz

• elimination changes eigenvalues

• eigenvector application to diff eqs $$\frac{du}{dt}=Au$$

• soln is exponential: $$u(t) = c_1 e^{\lambda_1 t} x_1 + c_2 e^{\lambda_2 t} x_2$$

• eigenvalue eqn: $$Ax = \lambda x \implies (A-\lambda I)x=0$$

• $$det(A-\lambda I) = 0$$ yields characteristic polynomial

• eigenvalue properties

• 0 eigenvalue $$\implies$$ A is singular

• eigenvalues are on the main diagonal when the matrix is triangular

• expressions when $$A \in \mathbb{S}$$

• $$\det(A) = \prod_i \lambda_i$$

• $$tr(A) = \sum_i \lambda_i$$

• $$||A||_2 = \max | \lambda_i |$$

• $$||A||_F = \sqrt{\sum \lambda_i^2}$$

• $$\lambda_{max} (A) = \sup_{x \neq 0} \frac{x^T A x}{x^T x}$$

• $$\lambda_{min} (A) = \inf_{x \neq 0} \frac{x^T A x}{x^T x}$$

• defective matrices - lack a full set of eigenvalues

• positive semi-definite: $$A \in R^{nxn}$$

• basically these are always symmetric $$A=A^T$$

• all eigenvalues are nonnegative

• if $$\forall x \in R^n, x^TAx \geq 0$$ then A is positive semi definite (PSD)

• like it curves up

• Note: $$x^TAx = \sum_{i, j} x_iA_{i, j} x_j$$

• if $$\forall x \in R^n, x^TAx > 0$$ then A is positive definite (PD)

• PD $$\to$$ full rank, invertible

• PSD + symmetric $$\implies$$ can be written as Gram matrix $$G = X^T X$$

• if X full rank, then $$G$$ is PD

• PSD notation

• $$S^n$$ - set of symmetric matrices

• $$S^n_+$$ - set of PSD matrices

• $$S^n_{++}$$ - set of PD matrices

### 3.4.4.2. strang 5.2 - diagonalization¶

• diagonalization = eigenvalue decomposition = spectral decomposition

• assume A (nxn) is symmetric

• $$A = Q \Lambda Q^T$$

• Q := eigenvectors as columns, Q is orthonormal

• only diagonalizable if n independent eigenvectors

• not related to invertibility

• eigenvectors corresponding to different eigenvalues are lin. independent

• other Q matrices won’t produce diagonal

• there are always n complex eigenvalues

• orthogonal matrix $$Q^TQ=I$$

• examples

• if X, Y symmetric, $$tr(YX) = tr(Y \sum \lambda_i q_i q_i^T)$$

• lets us easily calculate $$A^2$$, $$sqrt(A)$$

• eigenvalues of $$A^2$$ are squared, eigenvectors remain same

• eigenvalues of $$A^{-1}$$ are inverse eigenvalues

• eigenvalue of rotation matrix is $$i$$

• eigenvalues for $$AB$$ only multiply when A and B share eigenvectors

• diagonalizable matrices share the same eigenvector matrix S iff $$AB = BA$$

• generalized eigenvalue decomposition - for 2 symmetric matrices

• $$A = V \Lambda V^T$$, $$B=VV^T$$

### 3.4.4.3. strang 6.3 - singular value decomposition¶

• SVD for any nxp matrix: $$X=U \Sigma V^T$$

• U columns (nxn) are eigenvectors of $$XX^T$$

• columns of V (pxp) are eigenvectors of $$X^TX$$

• r singular values on diagonal of $$\Sigma$$ (nxp) - square roots of nonzero eigenvalues of both $$XX^T$$ and $$X^TX$$

• like rotating, scaling, and rotating back

• SVD ex. $$A=UDV^T \implies A^{-1} = VD^{-1} U^T$$

• $$X = \sum_i \sigma_i u_i v_i^T$$

• properties

1. for PD matrices, $$\Sigma=\Lambda$$, $$U\Sigma V^T = Q \Lambda Q^T$$

• for other symmetric matrices, any negative eigenvalues in $$\Lambda$$ become positive in $$\Sigma$$

• applications

• very numerically stable because U and V are orthogonal matrices

• condition number of invertible nxn matrix = $$\sigma_{max} / \sigma_{min}$$

• $$A=U\Sigma V^T = u_1 \sigma_1 v_1^T + ... + u_r \sigma_r v_r^T$$

• we can throw away columns corresponding to small $$\sigma_i$$

• pseudoinverse $$A^+ = V \Sigma^+ U^T$$

### 3.4.4.4. strang 5.3 - difference eqs and power $$A^k$$¶

• compound interest

• solving for fibonacci numbers

• Markov matrices

• corresponds to $$\lambda = 1$$

• stability of $$u_{k+1} = A u_k$$

• stable if all eigenvalues satisfy $$|\lambda_i|$$ <1

• neutrally stable if some $$|\lambda_i|=1$$

• unstable if at least one $$|\lambda_i|$$ > 1

• Leontief’s input-output matrix

• Perron-Frobenius thm - if A is a positive matrix (positive values), so is its largest eigenvalue and every component of the corresponding eigenvector is also positive

• useful for ranking, etc.

• power method: want to find eigenvector $$v$$ corresponding to largest eigenvalue

• $$v = \underset{n \to \infty}{\lim} \frac{A^n v_0}{|A^nv_0|}$$ where $$v_0$$ is nonnegative