diff --git a/README.md b/README.md
index a86d6702992a194446ffd36c5de83cddd6869d6d..e0750cf33575ee37119570977e030f9ed8a5d14c 100644
--- a/README.md
+++ b/README.md
@@ -40,7 +40,7 @@ Priors are hard to eliminate.
 
 I think this is an interesting approach to control, especially if we do not want to work very hard making perfect models, or designing controllers. It is computationally *much* more expensive than an analytically produced ideal controller, but in simple systems is very tractable.  
 
-To bring this into hardware, my first-next-step is to run a similar search routine over model parameters, aligning them with real-world observations. I can then do expensive forecast-based search offline, while using best-forecast control laws online. 
+To bring this into hardware, my first-next-step is to run a similar search routine over model parameters, aligning them with real-world observations. I can then do expensive forecast-based search offline, while using best-forecast control laws online.
 
 ## The Simulation
 
@@ -101,13 +101,21 @@ There are more, but I need to reestablish my VPN into MIT to access,
 [Automatic Tuning of Optimum PID Controllers](https://digital-library.theiet.org/content/journals/10.1049/ip-d.1993.0030)  
 [Automatic Tuning and Adaption for PID Controllers - a Survey](https://www.sciencedirect.com/science/article/abs/pii/096706619391394C)   
 
-## 'UKX' Control
+## State Space Control
 
-TODO: better explainer, mathjax.
+With PID, we assume we have information just about one state in the system: the current error. PID controllers then build a Derivative and Integral term for this error, and assign coefficients to each. In (my cursory understanding) of State Space control, we take as much of the system state is reasonable to measure, and apply control gains to each of these variables, to produce an output signal.
 
-So, pid seeming like a black hole of 'OK' success and all, I did the background learning on state space control and LQR, finding that the *result* of LQR optimization is the generation of an ideal 'K' matrix, which is a control law: meaning that if we have a state matrix 'x' - in this case this is our x position, xdot (speed), theta (angle) and tdot (angular velocity), we have a simple matrix K (aka a 'gain matrix') that we multiply with out state matrix (vector? words) to get some 'u' - our control output. In some sense this is *like* PID control, but it operates on system state, not an error... Interestingly (to me), this means we control around the entire state - here I can write laws that will *also* bring the x-position back to center.
+In this example, I have four state variables that I'm interested in: the current cart position $`x`$ and speed $`\dot{x}`$, and the angular position $`o`$ and speed $`\dot{o}`$. The accelerations of both are goverened either by our control input (cart accel == motor torque) or the ODE that describes the system. Here we will extract one control output by multiplying this by a vector of the same length:
 
-I've formulated this into my simulated controller, and by hand tuned the K gains, resulting in control that looks like this:
+```math
+u = -Kx
+```
+
+When I first saw this, I thought it was too simple. We can just pick some four variables for our K, to write a 'control law' - and there's even a kind of intuition for it, we know which K[i] multiply by which state variables, so if we know we want to accelerate left when the pendulum leans that way, we can put a negative number in the last two taps.
+
+What else is cool about this, that PID would not do (unless ... two PID controllers?), is that I can write a law that *both* balances the stick, as well as spins-up the pole, and also brings the cart position slowly to a target state at 0.
+
+So, I re-wrote my little simulation-controller, and by hand tuned the K gains, resulting in control that looks like this:
 
 ![ukx-by-hand](log/2020-05-17_ukx-by-hand.mp4)
 
@@ -115,16 +123,70 @@ This is satisfying, and is *much* smoother than the PID approach. Since I am abl
 
 ## Search for Gain Matrix
 
-Discussion: how did you search? over gaussian set of starting conditions for set time period. What went wrong? finds many local minima. tried momentum (helps) and tried knocking small simplexes into larger volumes. Neither finds their way out of bad starting conditions. Tried modifying error metrics, sample length, starting condition distributions, to no avail.
+So the obvious move was to try running simplex on the gains matrix. The function evaluation is tricky though. I tried this, in various forms, by running to-be-evaluated gains matrices over fixed time periods.
+
+At first, I was using a fixed set of starting conditions to operate from, and reporting the sum of all errors over the time period. This found controllers, but they were fragile outside of startup conditions that were nonsimilar to what they were trained on, no surprise.
+
+I moved to draw samples of 10, up to 1000, starting conditions, each running for up to 10 seconds of simulation time, *at each function evaluation* to try to produce 'smooth' search landscapes to sample from. I eventually also added momentum, but continued to find controllers that were obviously stuck in local minima. Some momentum *did* do wonders to help, but not really enough to produce robust controllers.
+
+In either case, none of these approaches worked when I gave them starting gains matricies (simplex start positions) that were far away from my hand-tuned controller. Especially when starting gains matricies had terms of the 'wrong' sign, search results were essentially useless.
 
 ![ok-simplex](log/2020-05-18_ok-search.mp4)
 
-... one more video
+![ok-simplex](log/2020-05-18_ok-search-02.mp4)
 
 ![local-minima](log/2020-05-18_local-minima.mp4)
 
+It still seems likely to me that, with a better sampling / evaluation system (perhaps a different / improved error metric? i.e. time-to-balance, not accumulated error?) *should* work. But I gave up, and moved on to:
+
 ## Continuous Simplex
 
+I then thought, what if I use simplex to just search for the control law to apply *at this time step*, and performed one simplex operation per time step. This way I would retain a neighbourhood of 'moves' to try, could re-evaluate each of these at every step, and do the next simplex move, then move on.
+
+This failed miserably, but then I tried:
+
+## Continuous, Fixed Move Fixed Horizon Search
+
+I don't know what this is, what I do is:
+- given current state and a gains matrix
+  - generate a grid of gains matrices in the neighbourhood: a kind of fixed-size 'simplex' with new moves in each direction in each dimension
+  - use the simulation to forcast errors for some time horizon, using current state and trial gains matrices for each case
+  - pick the gains matrix that has minimal error in this forcast,
+- increment the simulation one time step, and repeat the loop
+
+This is dirty and simple, but worked. Here's a few trial runs from where I first had this woken up, running with a 1500ms horizon:
+
+![sg-01](log/2020-05-18_stepgen-long-horizon.mp4)
+
+When the horizon is smaller, this favours aggressive balancing moves, but ignores drift in position. Here is is running on a 500ms horizon:
+
+![sg-02](log/2020-05-18_stepgen-short-horizon.mp4)
+
+Finally, I want to try perturbing this thing... I'll add some keystrokes to knock angular velocity into it.
+
+![sg-03](log/2020-05-18_stepgen-perturbed.mp4)
+
+Most importantly, in improvement over previous use of simplex, this can walk out of very bad starting conditions and is still successful. All of the initial conditions in the videos above are using completely random startup gains matrices. It's sensitive to the projection time span / horizon 'depth', and to the error function: when more of the error comes from the angular position, it generates controllers that drift. Not enough, it generates controllers that ignore balancing and just drive position to zero. This is coupled to the forcast depth: if it can't see that a small perterbation in angle *now* will lead to better x-alignment *later* (i.e. there's not enough time to see the result of the lean-to-movement), it won't do that. It's also sensitive to the size of the search 'kernel' (if that's an appropriate word for whatever-this-is).
+
+# Next Steps
+
+## Improved Move Generation
+
+The move generator (last, above), is neat, but needs refinement.
+- it should settle down when control is stable
+- appropriateness for control: these are not ideal controllers, we want systems to be deterministic. over time, if conditions become 'normally' stable, small perterbations lead to big problems
+- while walking with the move generator, could try simultaneously building a map of good control states walked in previous traverses through the field, maybe fit some smoothness around these states, jump to previously-known best control laws when state is perturbed into these positions, this is similar perhaps in spirit to:
+
+## CMA-ES-ish
+
+The space of all control laws is actually pretty small, just four dimensions in this case. It seems likely that an approach *like* CMA-ES, or simpler, is possible, where I generate a large grid of points in the four-dimensional space, forcast each of their performances, fit some smoothness to those results, and try walking that with a smaller grid, etc.
+
+## Hardware / Simulation Alignment
+
+I'm more interested in something to do with hardware at the moment... It occurs to me that I could use a similar approach to align hardware observations to a simulation by searching for simulation parameters that match observations. Once I have a 'matched' simulation, I can use that simulation to perform control searches.
+
+For example, though I was running, in each simulation-time-step of one ms, over 45 seconds of forecasts, the simulation kept up to real-time rendering in my fancy laptop - and dare I say it isn't very well optimized - this means we can use relatively 'dumb' controllers to execute control laws while big friends elsewhere in the control system w/ beefy processors use their IFT to 'imagine' many possible control laws for the next few ms of real-time. This, would be *neat*.
+
 ## The Stepper Driver
 
 I want to build a new stepper driver (code) for this. Currently, my driver & hardware pair has these settings, maxed:
@@ -162,26 +224,6 @@ One of the real puzzles here - and I hope this will unlock some secrets for futu
 
 My suspicion is that *sourcing* time from the lower levels is the move, i.e. track state / time advancements from the bottom, and somehow trickle those up.
 
-## The Actual Control
-
-Here's where I know the least... again, starting point is certainly the simulation, with keyboard inputs to play with. I hope to learn from the simulation:
-- what accels, what speeds are necesssary for success (human controller)
-- above, w/ relationships to pendulum weight / length,
-
-From there, I'd also like to plot the phase space of the system. I have some idea that one control strategy would be to run a kind of 'path planning' inside of this space... if any point in the space represents a particular state, plot a course to the state we are interested in being at.
-
-I should also do some lit review, that should start here.
-
-[youtube pendulum 1](https://www.youtube.com/watch?v=XWhGjxdug0o)
-
-### Swing Up and Balancing
-
-I imagine I'll start with balancing, as swing up seems to require some next level magic. In [the first video above](https://www.youtube.com/watch?v=XWhGjxdug0o), these are two distinct controllers!
-
-### Learning
-
-The idea is to craft a best-practices-and-all-the-priors model and controller first, and *then* start cutting pieces out to replace them with learning / search.
-
 # Log
 
 ## 2020 05 18
@@ -248,11 +290,6 @@ This is it for now. I'll write everything up and speculate on future work:
 - relationship between forecast size and system time constant
 - appropriateness for control: these are not ideal controllers, we want systems to be deterministic. over time, if conditions become 'normally' stable, small perterbations lead to big problems - try simultaneously building a map of good control states walked in previous traverses through that state, maybe fit some smoothness around these states, jump to previously-known best control laws when state is perturbed into these positions
 
-### Finishing the Assignment
-
-- write more about LQR, try to understand
-- plot / quantify simplex success on rosenbrock
-
 ## 2020 05 17
 
 Before proceeding, I need a better plan. The original question was about how much of existing control I could replace with search. Now that I'm two days away, it is perhaps apparent to me that I will not likely learn enough about current controller to do this, and then pare out results into search methods. First, I should try to learn what I can about pendulum controllers, they're well studied and perhaps I *can* easily implement best state of the art control.