2.6. algorithms

Some notes on algorithms following the book Introduction to algorithms and based on UVA’s course.

2.6.1. asymptotics

  • Big-O

    • big-oh: O(g): functions that grow no faster than g - upper bound, runs in time less than g

      • \(f(n) \leq c\cdot g(n)​\) for some c, large n

      • set of functions s.t. there exists c,k>0, 0 ≤ f(n) ≤ c*g(n), for all n > k

    • big-theta: Θ(g): functions that grow at the same rate as g

      • big-oh(g) and big-theta(g) - asymptotic tight bound

    • big-omega: Ω(g): functions that grow at least as fast as g

      • f(n)≥c*g(n) for some c, large n

    • Example: f = 57n+3

      • O(n^2) - or anything bigger

      • Θ(n)

      • Ω(n^.5) - or anything smaller

      • input must be positive

      • We always analyze the worst case run-time

    • little-omega: omega(g) - functions that grow faster than g

    • little-o: o(g) - functions that grow slower than g

      • we write f(n) ∈ O(g(n)), not f(n) = O(g(n))

    • They are all reflexive and transitive, but only Θ is symmetric.

      • Θ defines an equivalence relation.

  • add 2 functions, growth rate will be \(O(max(g_1(n)+g_2(n))\) (same for sequential code)

  • recurrence thm:\( f(n) = O(n^c) => T(n) = 0(n^c)\)

    • \(T(n) = a*T(n/b) + f(n)\)

    • \(c = log_b(a)\)

  • over bounded number of elements, almost everything is constant time

2.6.2. recursion

  • moving down/right on an NxN grid - each path has length (N-1)+(N-1)

    • we must move right N-1 times

    • ans = (N-1+N-1 choose N-1)

    • for recursion, if a list is declared outside static recursive method, it shouldn’t be static

  • generate permutations - recursive, add char at each spot

  • think hard about the base case before starting

    • look for lengths that you know

    • look for symmetry

  • n-queens - one array of length n, go row by row

2.6.3. dynamic programming

//returns max value for knapsack of capacity W, weights wt, vals val
int knapSack(int W, int wt[], int val[])
int n = wt.length;
int K[n+1][W+1];
//build table K[][] in bottom up manner
for (int i = 0; i <= n; i++)
   for (int w = 0; w <= W; w++)
	   if $(i==0 \vert \vert  w==0)$ // base case
		   K[i][w] = 0;
	   else if (wt[i-1] <= w) //max of including weight, not including
		   K[i][w] = max(val[i-1] + K[i-1][w-wt[i-1]], K[i-1][w]);
	   else //weight too large
		   K[i][w] = K[i-1][w];
return K[n][W];

2.6.4. min-cut / max-cut

2.6.5. hungarian

  • assign N things to N targets, each with an associated cost

2.6.6. max-flow

  • A list of pipes is given, with different flow-capacities. These pipes are connected at their endpoints. What is the maximum amount of water that you can route from a given starting point to a given ending point?

2.6.7. sorting

  • you can assume w.l.o.g. all input numbers are unique

  • sorting requires Ω(nlog n) (proof w/ tree)

    • considerations: worst case, average, in practice, input distribution, stability (order coming in is preserved for things with same keys), in-situ (in-place), stack depth, having to read/write to disk (disk is much slower), parallelizable, online (more data coming in)

  • adaptive - changes its behavior based on input (ex. bubble sort will stop)

2.6.7.1. comparised-based

2.6.7.1.1. bubble sort

  • keep swapping adjacent pairs

for i=1:n-1
	if a[i+1]<a[i]
		swap(a,i,i+1)
  • have a flag that tells if you did no swaps - done

  • number of passes ~ how far elements are from their final positions

  • O(n^2)

2.6.7.1.2. odd-even sort

  • swap even pairs

  • then swap odd pairs

  • parallelizable

2.6.7.1.3. selection sort

  • move largest to current position for i=1:n-1 for j=1:n x = max(x,a[j]) jmax = j swap(a,i,j)

  • 0(n^2)

2.6.7.1.4. insertion sort

  • insert each item into lists for i=2:n insert a[i] into a[1..(i-1)] shift

  • O(n^2), O(nk) where k is max dist from final position

  • best when alsmost sorted

2.6.7.1.5. heap sort

  • insert everything into heap

  • kepp remove max

  • can do in place by storing everything in array

  • can use any height-balanced tree instead of heap

    • traverse tree to get order

    • ex. B-tree: multi-rotations occur infrequently, average O(log n) height

  • 0(n log n)

2.6.7.1.6. smooth sort

  • adaptive heapsort

  • collection of heaps (each one is a factor larger than the one before)

  • can add and remove in essentially constant time if data is in order

2.6.7.1.7. merge sort

  • split into smaller arrays, sort, merge

  • T(n) = 2T(n/2) + n = 0(n log n)

  • stable, parallelizable (if parallel, not in place)

2.6.7.1.8. quicksort

  • split on pivot, put smaller elements on left, larger on right

  • O(n log n) average, O(n^2) worst

  • O(log n) space

2.6.7.1.9. shell sort

  • generalize insertion sort

  • insertion-sort all items i apart where i starts big and then becomes small

    • sorted after last pass (i=1)

  • O(n^2), O(n^(3/2), … unsure what complexity is

    • no matter what must be more than n log n

  • not used much in practice

2.6.7.2. not comparison-based

2.6.7.2.1. counting sort

  • use values as array indices in new sort

  • keep count of number of times at each index

  • for specialized data only, need small values

  • 0(n) time, 0(k) space

2.6.7.2.2. bucket sort

  • spread data into buckets based on value

  • sort the buckets

  • O(n+k) time

  • buckets could be trees

2.6.7.2.3. radix sort

  • sort each digit in turn

  • stable sort on each digit

    • like bucket sort d times

  • 0(d*n time), 0(k+n) space

2.6.7.2.4. meta sort

  • like quicksort, but 0(nlogn) worst case

  • run quicksort, mergesort in parallel

    • stop when one stops

  • there is an overhead but doesn’t affect big-oh analysis

  • ave, worst-cast = O(n log n)

2.6.7.3. sorting overview

  • in exceptional cases insertion-sort or radix-sort are much better than the generic QuickSort / MergeSort / HeapSort answers.

  • merge a and b sorted - start from the back

2.6.8. searching

  • binary sort can’t do better than linear if there are duplicates

  • if data is too large, we need to do external sort (sort parts of it and write them back to file)

  • write binary search recursively

    • use low<= val and high >=val so you get correct bounds

    • binary search with empty strings - make sure that there is an element at the end of it

  • “a”.compareTo(“b”) is -1

  • we always round up for these

  • finding minimum is Ω(n)

    • pf: assume an element was ignored, that element could have been minimum

    • simple algorithm - keep track of best so far

    • thm: n/2 comparisons are necessary because each comparison involves 2 elements

    • thm: n-1 comparisons are necessary - need to keep track of knowledge gained

      • every non-min element must win atleast once (move from unkown to known)

  • find min and max

    • naive solution has 2n-2 comparison

    • pairwise compare all elements, array of maxes, array of mins = n/2 comparisons

      • check min array, max array = 2* (n/2-1)

    • 3n/2-2 comparisons are sufficient (and necessary)

      • pf: 4 categories (not tested, only won, only lost, both)

      • not tested-> w or l =n/2 comparisons

      • w or l -> both = n/2-1

      • therefore 3n/2-2 comparisons necessary

  • find max and next-to-max

    • thm: n-2 + log(n) comparisons are sufficient

    • consider elimination tournament, pairwise compare elements repeatedly

      • 2nd best must have played best at some point - look for it in log(n)

  • selection - find ith largest integer

    • repeatedly finding median finds ith largest

    • finding median linear yields ith largest linear

      • T(n) = T(n/2) + M(n) where M(n) is time to find median

    • quickselect - partition around pivot and recur

      • average time linear, worst case O(n^2)

  • median in linear time - quickly eliminate a constant fraction and repeat

    • partition into n/5 groups of 5

      • sort each group high to low

      • find median of each group

      • compute median of medians recursively

      • move groups with larger medians to right

        • move groups with smaller medians to left

      • now we know 3/10 of elements larger than median of medians

        • 3/10 of elements smaller than median of medians

      • partition all elements around median of medians

        • recur like quickselect

        • guarantees each partition contains at most 7n/10 elements

      • T(n) = T(n/5) + T(7n/10) + O(n) -> f(x+y)≥f(x)+f(y)

      • T(n) ≤ T(9n/10) + O(n) -> this had to be less than T(n)

2.6.9. computational geometry

  • range queries

    • input = n points (vectors) with preprocessing

    • output - number of points within any query rectangle

    • 1D

      • range query is a pair of binary searches

      • O(log n) time per query

      • O(n) space, O(n log n) preprocessing time

    • 2D

      • subtract out rectangles you don’t want

      • add back things you double subtracted

      • we want rectangles anchored at origin

    • nD

      • make regions by making a grid that includes all points

      • precompute southwest counts for all regions - different ways to do this - tradeoffs between space and time

      • O(log n) time per query (after precomputing) - binary search x,y

  • polygon-point intersection

    • polygon - a closed sequence of segments

    • simple polygon - has no intersections

    • thm (Jordan) - a simple polygon partitions the plane into 3 regions: interior, exterior, boundary

    • convex polygon - intersection of half-planes

    • polytope - higher-dimensional polygon

    • raycasting

      • intersections = odd - interior, even - exterior

      • check for tangent lines, intersecting corners

      • O(n) time per query, O(1) space and time

    • convex case

      • preprocessing

        • find an interior point p (pick a vertext or average the vertices)

        • partition into wedges (slicing through vertices) w.r.t. p

        • sort wedges by polar angle

      • query

        • find containing wedge (look up by angle)

        • test interior / exterior

          • check triangle - cast ray from p to point, see if it crosses edge

      • O(log n) time per query (we binary search the wedges)

      • O(n) space and O(n log n) preprocessing time

    • non-convex case

      • preprocessing

        • sort vertices by x

        • find vertical slices

        • partition into trapezoids (triangle is trapezoid)

        • sort slice trapezoids by y

      • query

        • find containing slice

        • find trapezoid in slice

        • report interior/ exterior

      • O(log n) time per query (two binary searches)

      • O(n^2) space and O(n^2) preprocessing time

  • convex hull

    • input: set of n points

    • output: smallest containing convex polygon

    • simple solution 1 - Jarvis’s march

    • simple solution 2 - Graham’s scan

    • mergehull

      • partition into two sets - computer MergeHull of each set

      • merge the two resulting CHS

        • pick point p with least x

        • form angle-monotone chains w.r.t p

        • merge chains into angle-sorted list

        • run Graham’s scan to form CH

      • T(n) = 2T(n/2) + n = 0(n log n)

      • generalizes to higher dimensions

      • parallelizes

    • quickhull (like quicksort)

      • find right and left-most points

        • partition points along this line

        • find points farthest from line - make quadrilateral

          • eliminate all internal points

        • recurse on 4 remaining regions

        • concatenate resulting CHs

      • O(n log n) expected time

      • O(n^2) worst-case time - ex. circle

      • generalizes to higher dim, parallelizes

    • lower bound - CH requires Ω(n log n) comparisons

      • pf - reduce sorting to convex hull

      • consider arbitrary set of x_i to be sorted

      • raise the x_i to the parabola (x_i,x_i^2) - could be any concave function

      • compute convex hull of the parabola - all connected and line on top

      • from convex hull we can get sorted x_i => convex hull did sorting so at least n log n comparisons

      • corollary - Graham’s scan is optimal

    • Chan’s convex hull algorithm

      • assume we know CH size m=h

      • partitoin points into n/m sets of m each

  • convex polygon diameter

  • Voronoi diagrams - input n points - takes O(nlogn) time to compute

    • problems that are solved

      • Voronoi cell - the set of points closer to any given point than all others form a convex polygon

      • generalizes to other metrics (not just Euclidean distance)

      • a Voronoi cell is unbounded if and only if it’s point is on the convex hull

        • corollary - convex hull can be computed in linear time

      • Voronoi diagram has at most 2n-5 vertices and 3n-6 edges

      • every nearest neighbor of a point defines an edge of the Voronoi diagram

        • corollary - all nearest neighbors can be computed from the Voronoi diagram in linear time

        • corollary - nearest neighbor search in O(log n) time using planar subdivision search (binary search in 2D)

      • connection points of neighboring Voronoi diagram cells form a triangulation (Delanuay triangulation)

      • a Delanuay triangulation maximizes the minimum angle over all triangulations - no long slivery triangles

        • Euclidean minimum spanning tree is a subset of the Delanuay triangulation (can be computed easily)

    • calculating Voronoi diagram

      • discrete case / bitmap - expand breadth-first waves from all points

        • time is O(bitmap size)

        • time is independent of #points

      • intersecting half planes

        • Voronoi cell of a point is intersection of all half-planes induced by the perpendicular bisectors w.r.t all other points

        • use intersection of convex polygons to intersect half-planes (nlogn time per cell)

      • can be computed in nlogn total time

        1. idea divide and conquer

        • merging is complex

        1. sweep line using parabolas