Commit 8a1ae21a authored by Erik Strand's avatar Erik Strand

Write up 8th pset

parent 5e51032d
---
title: Problem Set 8 (Optimization)
---
This week's questions are all about exploring various optimization algorithms, so I won't write them
here specifically. This page summarizes my results. For notes on derivation and implementation, see
[here](../notes/optimization.html). Code is [here](https://gitlab.cba.mit.edu/erik/optimization).
I implemented Nelder-Mead, naive gradient descent (always step a constant factor times the
gradient), and conjugate gradient descent. For the latter I initially used a golden section line
search, but replaced it with Brent's method. I still want to update it to use Brent's method with
derivatives. I also played around with the reference CMA-ES implementation for comparison.
## Visualizations
Nelder-Mead 2d
![psi](../assets/img/08_nelder_mead_plot_2d.png)
Nelder-Mead 10d
![psi](../assets/img/08_nelder_mead_plot_10d.png)
Naive Gradient Descent 2d
![psi](../assets/img/08_gradient_descent_plot_2d.png)
Naive Gradient Descent 10d
![psi](../assets/img/08_gradient_descent_plot_10d.png)
Conjugate Gradient Descent 2d
![psi](../assets/img/08_conjugate_gradient_descent_plot_2d.png)
Conjugate Gradient Descent 10d
![psi](../assets/img/08_conjugate_gradient_descent_plot_10d.png)
CMA-ES 2d
![psi](../assets/img/08_cma_es_plot_2d.png)
CMA-ES 10d
![psi](../assets/img/08_cma_es_plot_10d.png)
## Performance Comparisons
I haven't made a pretty chart for this yet since I'm still playing around with the parameters (to
make it as fair a fight as possible). But initial results suggest that on the multi-dimensional
Rosenbrock function, conjugate gradient descent is a clear winner in scaling, at least through
one thousand dimensions.
Here are some raw numbers. Nelder Mead and CMA-ES are asked to reach a tolerance of 1e-8, while the
gradient descent methods are asked to find points where the gradient norm is less than 1e-8. These
are the number of evaluations required to reach those goals, capped at 1e6.
```
Dim 2
nm: 193
gd: 28527
cgd: 1021
cma: 870
Dim 4
nm: 562
gd: 35724
cgd: 9448
cma: 2760
Dim 8
nm: 1591
gd: 45049
cgd: 5987
cma: 5790
Dim 16
nm: 27440
gd: 124637
cgd: 13083
cma: 14292
Dim 32
nm: 78969
gd: 200567
cgd: 13898
cma: 57876
Dim 64
nm: 561195
gd: 455578
cgd: 15194
cma: 211616
Dim 128
nm: 1000001
gd: 564451
cgd: 19565
cma: 953694
Dim 256
nm: 1000001
gd: 782196
cgd: 28110
cma: 1000000
Dim 512
nm: 1000001
gd: 1000000
cgd: 46918
cma: 1000000
Dim 1024
nm: 1000001
gd: 1000000
cgd: 10055
cma: 1000000
```
As a final note, all the algorithms except CMA-ES spent the majority of their time evaluating the
function. But CMA-ES imposes large computational costs of its own, since it essentially has to
invert a covariance matrix (or at least solve an eigenvalue problem).
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment