want big h(n) because we expand everything with
relaxed problem yields admissible heuristics
schema - representation
could just discretize neighborhood of each state
SGD: line search - double
this is not the search graph!
initialize state values to zero and iteratively them
Bellman update for each state
once we have values for each state, $\pi^(s) = \underset{a}{\text{argmax}} \: Q^(s, a)$
initialize policy and iteratively update it
Causal chain | Common cause | Common effect | |
---|---|---|---|
|
❌ | ❌ | ✅ |
|
✅ | ✅ | ❌ |
similar to bayes nets, but we add 2 things:
filtering = state estimation - compute
prediction - compute
for
smoothing - compute
for
most likely explanation -
sample a bunch of particles and use to approximate probabilities:
speed-ups when num particles < num possible states
first-order logic: add objects, relations, quantifiers (
simple: first-order logic forward-chaining: FOL-FC-ASK
this re-iterates the relevant equations using the equations in Russel & Norvig (all based on utitily funciton U(s)
given
find
here only Q-learning: