Skip to content
Snippets Groups Projects
README.md 25.12 KiB

Inverted Pendulum

To finish Neil's maths class and to walk myself through some control basics, I'm going to try to control an inverted pendulum system.

TL;DR:

What are you asking?
To what extent can we replace classic 'ideal' controllers with search algorithms? In this case, for an inverted pendulum-on-a-cart system.

Whose done what before?
Most approaches use LQR controllers that find ideal control laws given a model of the system. LQR 'state space' controllers can perform pendulum swing-ups as well as balance upright pendulums, PID controllers can only do the latter.

What did you do?
I built a small simulation in JavaScript. I controlled with with a hand-tuned PID controller before developing a state-space controller for the same, and hand-tuned the control law.

I then tried various search algorithms to find an ideal control law without any model or derivative information.

A simplex method that evaluated control laws over a fixed time interval, was prone to local minima, and was only remotely successful when given starting conditions within the neighbourhood of my hand-tuned control law.

A simplex method that evaluated, at each simulation step, possible new control laws for a fixed horizon, performed slightly better, but still tracked into local minima, and developed control laws that minimized errors in the short horizon, but could not find ideal controllers for the full state space.

I then developed a local search method: at each time step it evaluates control laws in the neighbourhood of the currently-implemented control laws, and forecasts their performance at a fixed interval into the future. It picks the best one, and proceeds to the next simulation step, repeating.

How did you do it?

JavaScript.

What questions did you need to answer?

What does it mean to make a model-predictive controller?
What is 'state space' control?

What did you learn?

State space control: neat.
Computing is much faster than most of the systems I would be interested in 'low-level' control, so forecasting one- or two- seconds into the future is possible to do thousands of times per second.
Priors are hard to eliminate.

What are the implications for future work?

I think this is an interesting approach to control, especially if we do not want to work very hard making perfect models, or designing controllers. It is computationally much more expensive than an analytically produced ideal controller, but in simple systems is very tractable.

To bring this into hardware, my first-next-step is to run a similar search routine over model parameters, aligning them with real-world observations. I can then do expensive forecast-based search offline, while using best-forecast control laws online.

The Simulation

One first step is building an ODE simulation of a pendulum, and rendering that. I can do this pretty easily by pulling code from my earlier javascript pendulum, just moving the platform left/right instead of up/down. I should hope to build an understanding also for simulation time / weight / units relate to world units, so that I can match it against the world.

OK, to start, the pendulum itself can be modelled with:

\ddot{\theta} = \frac{g}{l}\sin  \theta

Where \theta is the angle of the pendulum, g is gravity, and l is the length of the pendulum.

This makes sense: the angular acceleration is equal to the force on the pendulum, which is a function of gravity, the length of the stick, and its current angle. But what of the cart?

l\ddot{{\theta}} - g\sin\theta = \ddot{x}\cos\theta

Where x is the position of the cart. This also makes sense! The acceleration of the cart (\ddot{x}) is related to the horizontal component of the angular acceleration of the pendulum.

In my tiny simulation, I'm going to drive the cart acceleration: I'll do this in hardware as well, so I should be able to expect some \ddot{\theta} given an \ddot{x}

\ddot{{\theta}} = \frac{\ddot{x}\cos\theta + g\sin\theta}{l}

I'm pretty sure this is enough to let me write some javascript. Since time is cheap (i.e. computing) and this is simple, I can even default to ahn euler method.

Yeah, this is fine... here's the free swinging, obviously we are used to the real world being damped

hello-p

I'll add keys... ok, just want to get a clip of nearly balancing this human-in-the-loop... it's tough, time is not real yet either:

control

OK, for time, I'll use JS timers... Got it, running in realtime now. OK.

cool, this works, and euler probably OK because we are already running in realtime. if anything, do better at JS handoffs, events, protocol, etc. control.

PID Control in Simulation

I've tooled around with PID now, so I can adjust these parameters by hand, next up is search for PID params.

pid-by-hand

Search for PID

Autotuning is a world. In fact, with PID being the duct-tape holding many feedback controllers together, having a robust tuning algorithm for PID might be the spiritual equivalent of free duct tape for life: it means you won't make many elegant things, but the things you do make, will work.

1999: Automatic PID Tuning w/o a Plant Model

This paper proposes a method where all we need to tune our PID gains is past data: no knowledge of the plant. It also proposes to do so in realtime, that is, alongside control. My previous idea was to run series of tests in the simulaiton and walk a simplex along an evaluation that occurred in some set of time (say 10s of balancing), with an accumulated error as a metric. The method here would likely be more of a wonderful-tool-belt item for control generally, but is less related to my plans.

There are more, but I need to reestablish my VPN into MIT to access,

Automatic Tuning of Optimum PID Controllers
Automatic Tuning and Adaption for PID Controllers - a Survey

'UKX' Control

TODO: better explainer, mathjax.

So, pid seeming like a black hole of 'OK' success and all, I did the background learning on state space control and LQR, finding that the result of LQR optimization is the generation of an ideal 'K' matrix, which is a control law: meaning that if we have a state matrix 'x' - in this case this is our x position, xdot (speed), theta (angle) and tdot (angular velocity), we have a simple matrix K (aka a 'gain matrix') that we multiply with out state matrix (vector? words) to get some 'u' - our control output. In some sense this is like PID control, but it operates on system state, not an error... Interestingly (to me), this means we control around the entire state - here I can write laws that will also bring the x-position back to center.

I've formulated this into my simulated controller, and by hand tuned the K gains, resulting in control that looks like this:

ukx-by-hand

This is satisfying, and is much smoother than the PID approach. Since I am able to get this close by hand I have good faith that I can write some simplex to search over simulations to find an ideal K matrix, instead of using LQR (which requires more detailed knowledge of the system of eqns that characterize the system). That's next.

Search for Gain Matrix

Discussion: how did you search? over gaussian set of starting conditions for set time period. What went wrong? finds many local minima. tried momentum (helps) and tried knocking small simplexes into larger volumes. Neither finds their way out of bad starting conditions. Tried modifying error metrics, sample length, starting condition distributions, to no avail.

ok-simplex

... one more video

local-minima

Continuous Simplex

The Stepper Driver

I want to build a new stepper driver (code) for this. Currently, my driver & hardware pair has these settings, maxed:

Setting Units Value Alt Units Value
steps / mm - 231
acceleration mm/s^2 30 m/s^2 0.03
top speed mm/s 20 m/s 0.02

Although it would seem as though I've made some mistakes in implementation, because this seems pitifully small in terms of 'Gs' - but observed acceleration seems to be near or beyond 1G.

  • first, cleanup existing code
  • steps/unit as a float,
  • underlying state / operation same as current, but understand and document limits
  • stepper becomes responsible for acceleration, watch world units, timers
  • stepper should become responsible for limits: I want to bake safety into hardware. as a result, it will also be master of its own absolute position.
  • to start down the new path, I'll roll it up to accept acceleration commands, in world units m/s^2
  • eventually I'll want to bundle data packets... RIP msegs

The Pendulum Hardware

I'm planning on using this axis type as the 'cart' for the pendulum project. This is part of a larger machine project I am working on in the mean time.

ratchet machine

This might prove to have some limits: these are designed to accelerate at a decent clip, but not hit huge top speeds. The 6.5:1 reduction there really kills and speed desires. I think I'll get through the simulation, get a sense of what kinds of accelerations and speeds are required, and revisit this if it's necessary.

There's some CAD and sensor reading here, as well - I'll need to make / print a pendulum bearing / encoder situation, write an encoder driver, and then very likely should filter that encoder data to build a state estimator for its current position.

Software Architecture

One of the real puzzles here - and I hope this will unlock some secrets for future, more general purpose control of machines - is how to handle the relationship between realtime-worlds (embedded code and genuine physics) and asynchronous worlds - like the javascript / networked runtime and simulation.

My suspicion is that sourcing time from the lower levels is the move, i.e. track state / time advancements from the bottom, and somehow trickle those up.

The Actual Control

Here's where I know the least... again, starting point is certainly the simulation, with keyboard inputs to play with. I hope to learn from the simulation:

  • what accels, what speeds are necesssary for success (human controller)
  • above, w/ relationships to pendulum weight / length,

From there, I'd also like to plot the phase space of the system. I have some idea that one control strategy would be to run a kind of 'path planning' inside of this space... if any point in the space represents a particular state, plot a course to the state we are interested in being at.

I should also do some lit review, that should start here.

youtube pendulum 1

Swing Up and Balancing

I imagine I'll start with balancing, as swing up seems to require some next level magic. In the first video above, these are two distinct controllers!

Learning

The idea is to craft a best-practices-and-all-the-priors model and controller first, and then start cutting pieces out to replace them with learning / search.

Log

2020 05 18

Well, this is still a bit of a useless hunk. Search (when given decent starting conditions) with simplex does seem to walk uphill for a minute, but quickly settles into some local minima. There's a few things:

Much Larger Sample

My first task this morning is to sample from a big ol' data set. With 10 gaussian samples, seems to already work much better. I'll try 100, 1000, and then try assering speed limits and use-of-accel penalties.

On a gaussian sample size of 100, 10s simulations with random start conditions this is honest to god starting to work, although it still has lots of help from the starting conditions. I should put a video in here... and maybe first test truly random startup vals.

ok-simplex

ok-two

This is ... better, but still not great. It does terribly when it isn't given the right quadrant to start operating in.

Well, I think I can conclud that this approach is fraught with local minima issues. I'm going to stash this an move on to a continuous simplex experiment.