Algorithms view markdown
Some notes on algorithms following the book Introduction to algorithms and based on UVA's course.
asymptotics
- Big-O
- big-oh: O(g): functions that grow no faster than g - upper bound, runs in time less than g
- $f(n) \leq c\cdot g(n)$ for some c, large n
- set of functions s.t. there exists c,k>0, 0 ≤ f(n) ≤ c*g(n), for all n > k
- big-theta: $\Theta (g)$: functions that grow at the same rate as g
- big-oh(g) and big-theta(g) - asymptotic tight bound
- big-omega: $\Omega(g)$: functions that grow at least as fast as g
- f(n)≥c*g(n) for some c, large n
- Example: f = 57n+3
- O(n^2) - or anything bigger
- Θ(n)
- Ω(n^.5) - or anything smaller
- input must be positive
- We always analyze the worst case run-time
- little-omega: omega(g) - functions that grow faster than g
- little-o: o(g) - functions that grow slower than g
- we write f(n) ∈ O(g(n)), not f(n) = O(g(n))
- They are all reflexive and transitive, but only Θ is symmetric.
- Θ defines an equivalence relation.
- big-oh: O(g): functions that grow no faster than g - upper bound, runs in time less than g
- add 2 functions, growth rate will be $O(max(g_1(n)+g_2(n))$ (same for sequential code)
- recurrence thm:$ f(n) = O(n^c) => T(n) = 0(n^c)$
- $T(n) = a*T(n/b) + f(n)$
- $c = log_b(a)$
- over bounded number of elements, almost everything is constant time
recursion
- moving down/right on an NxN grid - each path has length (N-1)+(N-1)
- we must move right N-1 times
- ans = (N-1+N-1 choose N-1)
- for recursion, if a list is declared outside static recursive method, it shouldn’t be static
- generate permutations - recursive, add char at each spot
- think hard about the base case before starting
- look for lengths that you know
- look for symmetry
- n-queens - one array of length n, go row by row
dynamic programming
//returns max value for knapsack of capacity W, weights wt, vals val
int knapSack(int W, int wt[], int val[])
int n = wt.length;
int K[n+1][W+1];
//build table K[][] in bottom up manner
for (int i = 0; i <= n; i++)
for (int w = 0; w <= W; w++)
if $(i==0 \vert \vert w==0)$ // base case
K[i][w] = 0;
else if (wt[i-1] <= w) //max of including weight, not including
K[i][w] = max(val[i-1] + K[i-1][w-wt[i-1]], K[i-1][w]);
else //weight too large
K[i][w] = K[i-1][w];
return K[n][W];
min-cut / max-cut
hungarian
- assign N things to N targets, each with an associated cost
max-flow
- A list of pipes is given, with different flow-capacities. These pipes are connected at their endpoints. What is the maximum amount of water that you can route from a given starting point to a given ending point?
sorting
- you can assume w.l.o.g. all input numbers are unique
- sorting requires Ω(nlog n) (proof w/ tree)
- considerations: worst case, average, in practice, input distribution, stability (order coming in is preserved for things with same keys), in-situ (in-place), stack depth, having to read/write to disk (disk is much slower), parallelizable, online (more data coming in)
- adaptive - changes its behavior based on input (ex. bubble sort will stop)
comparised-based
bubble sort
- keep swapping adjacent pairs
for i=1:n-1
if a[i+1]<a[i]
swap(a,i,i+1)
- have a flag that tells if you did no swaps - done
- number of passes ~ how far elements are from their final positions
- O(n^2)
odd-even sort
- swap even pairs
- then swap odd pairs
- parallelizable
selection sort
- move largest to current position for i=1:n-1 for j=1:n x = max(x,a[j]) jmax = j swap(a,i,j)
- 0(n^2)
insertion sort
- insert each item into lists for i=2:n insert a[i] into a[1..(i-1)] shift
- O(n^2), O(nk) where k is max dist from final position
- best when alsmost sorted
heap sort
- insert everything into heap
- kepp remove max
- can do in place by storing everything in array
- can use any height-balanced tree instead of heap
- traverse tree to get order
- ex. B-tree: multi-rotations occur infrequently, average O(log n) height
- 0(n log n)
smooth sort
- adaptive heapsort
- collection of heaps (each one is a factor larger than the one before)
- can add and remove in essentially constant time if data is in order
merge sort
- split into smaller arrays, sort, merge
- T(n) = 2T(n/2) + n = 0(n log n)
- stable, parallelizable (if parallel, not in place)
quicksort
- split on pivot, put smaller elements on left, larger on right
- O(n log n) average, O(n^2) worst
- O(log n) space
shell sort
- generalize insertion sort
- insertion-sort all items i apart where i starts big and then becomes small
- sorted after last pass (i=1)
- O(n^2), O(n^(3/2), … unsure what complexity is
- no matter what must be more than n log n
- not used much in practice
not comparison-based
counting sort
- use values as array indices in new sort
- keep count of number of times at each index
- for specialized data only, need small values
- 0(n) time, 0(k) space
bucket sort
- spread data into buckets based on value
- sort the buckets
- O(n+k) time
- buckets could be trees
radix sort
- sort each digit in turn
- stable sort on each digit
- like bucket sort d times
- 0(d*n time), 0(k+n) space
meta sort
- like quicksort, but 0(nlogn) worst case
- run quicksort, mergesort in parallel
- stop when one stops
- there is an overhead but doesn’t affect big-oh analysis
- ave, worst-cast = O(n log n)
sorting overview
- in exceptional cases insertion-sort or radix-sort are much better than the generic QuickSort / MergeSort / HeapSort answers.
- merge a and b sorted - start from the back
searching
- binary sort can’t do better than linear if there are duplicates
- if data is too large, we need to do external sort (sort parts of it and write them back to file)
- write binary search recursively
- use low<= val and high >=val so you get correct bounds
- binary search with empty strings - make sure that there is an element at the end of it
- “a”.compareTo(“b”) is -1
- we always round up for these
- finding minimum is Ω(n)
- pf: assume an element was ignored, that element could have been minimum
- simple algorithm - keep track of best so far
- thm: n/2 comparisons are necessary because each comparison involves 2 elements
- thm: n-1 comparisons are necessary - need to keep track of knowledge gained
- every non-min element must win atleast once (move from unkown to known)
- find min and max
- naive solution has 2n-2 comparison
- pairwise compare all elements, array of maxes, array of mins = n/2 comparisons
- check min array, max array = 2* (n/2-1)
- 3n/2-2 comparisons are sufficient (and necessary)
- pf: 4 categories (not tested, only won, only lost, both)
- not tested-> w or l =n/2 comparisons
- w or l -> both = n/2-1
- therefore 3n/2-2 comparisons necessary
- find max and next-to-max
- thm: n-2 + log(n) comparisons are sufficient
- consider elimination tournament, pairwise compare elements repeatedly
- 2nd best must have played best at some point - look for it in log(n)
- selection - find ith largest integer
- repeatedly finding median finds ith largest
- finding median linear yields ith largest linear
- T(n) = T(n/2) + M(n) where M(n) is time to find median
- quickselect - partition around pivot and recur
- average time linear, worst case O(n^2)
- median in linear time - quickly eliminate a constant fraction and repeat
- partition into n/5 groups of 5
- sort each group high to low
- find median of each group
- compute median of medians recursively
- move groups with larger medians to right
- move groups with smaller medians to left
- now we know 3/10 of elements larger than median of medians
- 3/10 of elements smaller than median of medians
- partition all elements around median of medians
- recur like quickselect
- guarantees each partition contains at most 7n/10 elements
- T(n) = T(n/5) + T(7n/10) + O(n) -> f(x+y)≥f(x)+f(y)
- T(n) ≤ T(9n/10) + O(n) -> this had to be less than T(n)
- partition into n/5 groups of 5
computational geometry
- range queries
- input = n points (vectors) with preprocessing
- output - number of points within any query rectangle
- 1D
- range query is a pair of binary searches
- O(log n) time per query
- O(n) space, O(n log n) preprocessing time
- 2D
- subtract out rectangles you don’t want
- add back things you double subtracted
- we want rectangles anchored at origin
- nD
- make regions by making a grid that includes all points
- precompute southwest counts for all regions - different ways to do this - tradeoffs between space and time
- O(log n) time per query (after precomputing) - binary search x,y
- polygon-point intersection
- polygon - a closed sequence of segments
- simple polygon - has no intersections
- thm (Jordan) - a simple polygon partitions the plane into 3 regions: interior, exterior, boundary
- convex polygon - intersection of half-planes
- polytope - higher-dimensional polygon
- raycasting
- intersections = odd - interior, even - exterior
- check for tangent lines, intersecting corners
- O(n) time per query, O(1) space and time
- convex case
- preprocessing
- find an interior point p (pick a vertext or average the vertices)
- partition into wedges (slicing through vertices) w.r.t. p
- sort wedges by polar angle
- query
- find containing wedge (look up by angle)
- test interior / exterior
- check triangle - cast ray from p to point, see if it crosses edge
- O(log n) time per query (we binary search the wedges)
- O(n) space and O(n log n) preprocessing time
- preprocessing
- non-convex case
- preprocessing
- sort vertices by x
- find vertical slices
- partition into trapezoids (triangle is trapezoid)
- sort slice trapezoids by y
- query
- find containing slice
- find trapezoid in slice
- report interior/ exterior
- O(log n) time per query (two binary searches)
- O(n^2) space and O(n^2) preprocessing time
- preprocessing
- convex hull
- input: set of n points
- output: smallest containing convex polygon
- simple solution 1 - Jarvis’s march
- simple solution 2 - Graham’s scan
- mergehull
- partition into two sets - computer MergeHull of each set
- merge the two resulting CHS
- pick point p with least x
- form angle-monotone chains w.r.t p
- merge chains into angle-sorted list
- run Graham’s scan to form CH
- T(n) = 2T(n/2) + n = 0(n log n)
- generalizes to higher dimensions
- parallelizes
- quickhull (like quicksort)
- find right and left-most points
- partition points along this line
- find points farthest from line - make quadrilateral
- eliminate all internal points
- recurse on 4 remaining regions
- concatenate resulting CHs
- O(n log n) expected time
- O(n^2) worst-case time - ex. circle
- generalizes to higher dim, parallelizes
- find right and left-most points
- lower bound - CH requires Ω(n log n) comparisons
- pf - reduce sorting to convex hull
- consider arbitrary set of x_i to be sorted
- raise the x_i to the parabola (x_i,x_i^2) - could be any concave function
- compute convex hull of the parabola - all connected and line on top
- from convex hull we can get sorted x_i => convex hull did sorting so at least n log n comparisons
- corollary - Graham’s scan is optimal
- Chan’s convex hull algorithm
- assume we know CH size m=h
- partitoin points into n/m sets of m each
- convex polygon diameter
- Voronoi diagrams - input n points - takes O(nlogn) time to compute
- problems that are solved
- Voronoi cell - the set of points closer to any given point than all others form a convex polygon
- generalizes to other metrics (not just Euclidean distance)
- a Voronoi cell is unbounded if and only if it’s point is on the convex hull
- corollary - convex hull can be computed in linear time
- Voronoi diagram has at most 2n-5 vertices and 3n-6 edges
- every nearest neighbor of a point defines an edge of the Voronoi diagram
- corollary - all nearest neighbors can be computed from the Voronoi diagram in linear time
- corollary - nearest neighbor search in O(log n) time using planar subdivision search (binary search in 2D)
- connection points of neighboring Voronoi diagram cells form a triangulation (Delanuay triangulation)
- a Delanuay triangulation maximizes the minimum angle over all triangulations - no long slivery triangles
- Euclidean minimum spanning tree is a subset of the Delanuay triangulation (can be computed easily)
- calculating Voronoi diagram
- discrete case / bitmap - expand breadth-first waves from all points
- time is O(bitmap size)
- time is independent of #points
- intersecting half planes
- Voronoi cell of a point is intersection of all half-planes induced by the perpendicular bisectors w.r.t all other points
- use intersection of convex polygons to intersect half-planes (nlogn time per cell)
- can be computed in nlogn total time
- idea divide and conquer
- merging is complex
- sweep line using parabolas
- idea divide and conquer
- discrete case / bitmap - expand breadth-first waves from all points
- problems that are solved