


want big h(n) because we expand everything with
relaxed problem yields admissible heuristics

schema - representation
could just discretize neighborhood of each state
SGD: line search - double

this is not the search graph!



initialize state values to zero and iteratively them
Bellman update for each state
once we have values for each state, $\pi^(s) = \underset{a}{\text{argmax}} \: Q^(s, a)$
initialize policy and iteratively update it
| Causal chain | Common cause | Common effect | |
|---|---|---|---|
![]() |
![]() |
![]() |
|
|
|
❌ | ❌ | ✅ |
|
|
✅ | ✅ | ❌ |
similar to bayes nets, but we add 2 things:
filtering = state estimation - compute
prediction - compute
for
smoothing - compute
for
most likely explanation -


sample a bunch of particles and use to approximate probabilities:
speed-ups when num particles < num possible states

first-order logic: add objects, relations, quantifiers (
simple: first-order logic forward-chaining: FOL-FC-ASK

this re-iterates the relevant equations using the equations in Russel & Norvig (all based on utitily funciton U(s)
given
find
here only Q-learning: