algorithms
Contents
2.6. algorithms#
Some notes on algorithms following the book Introduction to algorithms and based on UVAâs course.
2.6.1. asymptotics#
Big-O
big-oh: O(g): functions that grow no faster than g - upper bound, runs in time less than g
\(f(n) \leq c\cdot g(n)â\) for some c, large n
set of functions s.t. there exists c,k>0, 0 †f(n) †c*g(n), for all n > k
big-theta: Î(g): functions that grow at the same rate as g
big-oh(g) and big-theta(g) - asymptotic tight bound
big-omega: Ω(g): functions that grow at least as fast as g
f(n)â„c*g(n) for some c, large n
Example: f = 57n+3
O(n^2) - or anything bigger
Î(n)
Ω(n^.5) - or anything smaller
input must be positive
We always analyze the worst case run-time
little-omega: omega(g) - functions that grow faster than g
little-o: o(g) - functions that grow slower than g
we write f(n) â O(g(n)), not f(n) = O(g(n))
They are all reflexive and transitive, but only Î is symmetric.
Î defines an equivalence relation.
add 2 functions, growth rate will be \(O(max(g_1(n)+g_2(n))\) (same for sequential code)
recurrence thm:\( f(n) = O(n^c) => T(n) = 0(n^c)\)
\(T(n) = a*T(n/b) + f(n)\)
\(c = log_b(a)\)
over bounded number of elements, almost everything is constant time
2.6.2. recursion#
moving down/right on an NxN grid - each path has length (N-1)+(N-1)
we must move right N-1 times
ans = (N-1+N-1 choose N-1)
for recursion, if a list is declared outside static recursive method, it shouldnât be static
generate permutations - recursive, add char at each spot
think hard about the base case before starting
look for lengths that you know
look for symmetry
n-queens - one array of length n, go row by row
2.6.3. dynamic programming#
//returns max value for knapsack of capacity W, weights wt, vals val
int knapSack(int W, int wt[], int val[])
int n = wt.length;
int K[n+1][W+1];
//build table K[][] in bottom up manner
for (int i = 0; i <= n; i++)
for (int w = 0; w <= W; w++)
if $(i==0 \vert \vert w==0)$ // base case
K[i][w] = 0;
else if (wt[i-1] <= w) //max of including weight, not including
K[i][w] = max(val[i-1] + K[i-1][w-wt[i-1]], K[i-1][w]);
else //weight too large
K[i][w] = K[i-1][w];
return K[n][W];
2.6.4. min-cut / max-cut#
2.6.5. hungarian#
assign N things to N targets, each with an associated cost
2.6.6. max-flow#
A list of pipes is given, with different flow-capacities. These pipes are connected at their endpoints. What is the maximum amount of water that you can route from a given starting point to a given ending point?
2.6.7. sorting#
you can assume w.l.o.g. all input numbers are unique
sorting requires Ω(nlog n) (proof w/ tree)
considerations: worst case, average, in practice, input distribution, stability (order coming in is preserved for things with same keys), in-situ (in-place), stack depth, having to read/write to disk (disk is much slower), parallelizable, online (more data coming in)
adaptive - changes its behavior based on input (ex. bubble sort will stop)
2.6.7.1. comparised-based#
2.6.7.1.1. bubble sort#
keep swapping adjacent pairs
for i=1:n-1
if a[i+1]<a[i]
swap(a,i,i+1)
have a flag that tells if you did no swaps - done
number of passes ~ how far elements are from their final positions
O(n^2)
2.6.7.1.2. odd-even sort#
swap even pairs
then swap odd pairs
parallelizable
2.6.7.1.3. selection sort#
move largest to current position for i=1:n-1 for j=1:n x = max(x,a[j]) jmax = j swap(a,i,j)
0(n^2)
2.6.7.1.4. insertion sort#
insert each item into lists for i=2:n insert a[i] into a[1..(i-1)] shift
O(n^2), O(nk) where k is max dist from final position
best when alsmost sorted
2.6.7.1.5. heap sort#
insert everything into heap
kepp remove max
can do in place by storing everything in array
can use any height-balanced tree instead of heap
traverse tree to get order
ex. B-tree: multi-rotations occur infrequently, average O(log n) height
0(n log n)
2.6.7.1.6. smooth sort#
adaptive heapsort
collection of heaps (each one is a factor larger than the one before)
can add and remove in essentially constant time if data is in order
2.6.7.1.7. merge sort#
split into smaller arrays, sort, merge
T(n) = 2T(n/2) + n = 0(n log n)
stable, parallelizable (if parallel, not in place)
2.6.7.1.8. quicksort#
split on pivot, put smaller elements on left, larger on right
O(n log n) average, O(n^2) worst
O(log n) space
2.6.7.1.9. shell sort#
generalize insertion sort
insertion-sort all items i apart where i starts big and then becomes small
sorted after last pass (i=1)
O(n^2), O(n^(3/2), ⊠unsure what complexity is
no matter what must be more than n log n
not used much in practice
2.6.7.2. not comparison-based#
2.6.7.2.1. counting sort#
use values as array indices in new sort
keep count of number of times at each index
for specialized data only, need small values
0(n) time, 0(k) space
2.6.7.2.2. bucket sort#
spread data into buckets based on value
sort the buckets
O(n+k) time
buckets could be trees
2.6.7.2.3. radix sort#
sort each digit in turn
stable sort on each digit
like bucket sort d times
0(d*n time), 0(k+n) space
2.6.7.2.4. meta sort#
like quicksort, but 0(nlogn) worst case
run quicksort, mergesort in parallel
stop when one stops
there is an overhead but doesnât affect big-oh analysis
ave, worst-cast = O(n log n)
2.6.7.3. sorting overview#
in exceptional cases insertion-sort or radix-sort are much better than the generic QuickSort / MergeSort / HeapSort answers.
merge a and b sorted - start from the back
2.6.8. searching#
binary sort canât do better than linear if there are duplicates
if data is too large, we need to do external sort (sort parts of it and write them back to file)
write binary search recursively
use low<= val and high >=val so you get correct bounds
binary search with empty strings - make sure that there is an element at the end of it
âaâ.compareTo(âbâ) is -1
we always round up for these
finding minimum is Ω(n)
pf: assume an element was ignored, that element could have been minimum
simple algorithm - keep track of best so far
thm: n/2 comparisons are necessary because each comparison involves 2 elements
thm: n-1 comparisons are necessary - need to keep track of knowledge gained
every non-min element must win atleast once (move from unkown to known)
find min and max
naive solution has 2n-2 comparison
pairwise compare all elements, array of maxes, array of mins = n/2 comparisons
check min array, max array = 2* (n/2-1)
3n/2-2 comparisons are sufficient (and necessary)
pf: 4 categories (not tested, only won, only lost, both)
not tested-> w or l =n/2 comparisons
w or l -> both = n/2-1
therefore 3n/2-2 comparisons necessary
find max and next-to-max
thm: n-2 + log(n) comparisons are sufficient
consider elimination tournament, pairwise compare elements repeatedly
2nd best must have played best at some point - look for it in log(n)
selection - find ith largest integer
repeatedly finding median finds ith largest
finding median linear yields ith largest linear
T(n) = T(n/2) + M(n) where M(n) is time to find median
quickselect - partition around pivot and recur
average time linear, worst case O(n^2)
median in linear time - quickly eliminate a constant fraction and repeat
partition into n/5 groups of 5
sort each group high to low
find median of each group
compute median of medians recursively
move groups with larger medians to right
move groups with smaller medians to left
now we know 3/10 of elements larger than median of medians
3/10 of elements smaller than median of medians
partition all elements around median of medians
recur like quickselect
guarantees each partition contains at most 7n/10 elements
T(n) = T(n/5) + T(7n/10) + O(n) -> f(x+y)â„f(x)+f(y)
T(n) †T(9n/10) + O(n) -> this had to be less than T(n)
2.6.9. computational geometry#
range queries
input = n points (vectors) with preprocessing
output - number of points within any query rectangle
1D
range query is a pair of binary searches
O(log n) time per query
O(n) space, O(n log n) preprocessing time
2D
subtract out rectangles you donât want
add back things you double subtracted
we want rectangles anchored at origin
nD
make regions by making a grid that includes all points
precompute southwest counts for all regions - different ways to do this - tradeoffs between space and time
O(log n) time per query (after precomputing) - binary search x,y
polygon-point intersection
polygon - a closed sequence of segments
simple polygon - has no intersections
thm (Jordan) - a simple polygon partitions the plane into 3 regions: interior, exterior, boundary
convex polygon - intersection of half-planes
polytope - higher-dimensional polygon
raycasting
intersections = odd - interior, even - exterior
check for tangent lines, intersecting corners
O(n) time per query, O(1) space and time
convex case
preprocessing
find an interior point p (pick a vertext or average the vertices)
partition into wedges (slicing through vertices) w.r.t. p
sort wedges by polar angle
query
find containing wedge (look up by angle)
test interior / exterior
check triangle - cast ray from p to point, see if it crosses edge
O(log n) time per query (we binary search the wedges)
O(n) space and O(n log n) preprocessing time
non-convex case
preprocessing
sort vertices by x
find vertical slices
partition into trapezoids (triangle is trapezoid)
sort slice trapezoids by y
query
find containing slice
find trapezoid in slice
report interior/ exterior
O(log n) time per query (two binary searches)
O(n^2) space and O(n^2) preprocessing time
convex hull
input: set of n points
output: smallest containing convex polygon
simple solution 1 - Jarvisâs march
simple solution 2 - Grahamâs scan
mergehull
partition into two sets - computer MergeHull of each set
merge the two resulting CHS
pick point p with least x
form angle-monotone chains w.r.t p
merge chains into angle-sorted list
run Grahamâs scan to form CH
T(n) = 2T(n/2) + n = 0(n log n)
generalizes to higher dimensions
parallelizes
quickhull (like quicksort)
find right and left-most points
partition points along this line
find points farthest from line - make quadrilateral
eliminate all internal points
recurse on 4 remaining regions
concatenate resulting CHs
O(n log n) expected time
O(n^2) worst-case time - ex. circle
generalizes to higher dim, parallelizes
lower bound - CH requires Ω(n log n) comparisons
pf - reduce sorting to convex hull
consider arbitrary set of x_i to be sorted
raise the x_i to the parabola (x_i,x_i^2) - could be any concave function
compute convex hull of the parabola - all connected and line on top
from convex hull we can get sorted x_i => convex hull did sorting so at least n log n comparisons
corollary - Grahamâs scan is optimal
Chanâs convex hull algorithm
assume we know CH size m=h
partitoin points into n/m sets of m each
convex polygon diameter
Voronoi diagrams - input n points - takes O(nlogn) time to compute
problems that are solved
Voronoi cell - the set of points closer to any given point than all others form a convex polygon
generalizes to other metrics (not just Euclidean distance)
a Voronoi cell is unbounded if and only if itâs point is on the convex hull
corollary - convex hull can be computed in linear time
Voronoi diagram has at most 2n-5 vertices and 3n-6 edges
every nearest neighbor of a point defines an edge of the Voronoi diagram
corollary - all nearest neighbors can be computed from the Voronoi diagram in linear time
corollary - nearest neighbor search in O(log n) time using planar subdivision search (binary search in 2D)
connection points of neighboring Voronoi diagram cells form a triangulation (Delanuay triangulation)
a Delanuay triangulation maximizes the minimum angle over all triangulations - no long slivery triangles
Euclidean minimum spanning tree is a subset of the Delanuay triangulation (can be computed easily)
calculating Voronoi diagram
discrete case / bitmap - expand breadth-first waves from all points
time is O(bitmap size)
time is independent of #points
intersecting half planes
Voronoi cell of a point is intersection of all half-planes induced by the perpendicular bisectors w.r.t all other points
use intersection of convex polygons to intersect half-planes (nlogn time per cell)
can be computed in nlogn total time
idea divide and conquer
merging is complex
sweep line using parabolas