Commit d6d6661f authored by Erik Strand's avatar Erik Strand

Answer the compressed sensing questions

parent 9640c329
---
title: Problem Set 9 (Transformations)
---
I went rogue this week and did a deep dive on compressed sensing. So here's the compressed sensing
problem, followed by an exploration of compressed sensing CT reconstruction.
## 2
My code for all sections of this problem is
[here](https://gitlab.cba.mit.edu/erik/compressed_sensing). All the sampling and gradient descent is
done in C++ using [Eigen](http://eigen.tuxfamily.org) for vector and matrix operations. I use Python
and [matplotlib](https://matplotlib.org/) to generate the plots.
### (a)
{:.question}
Generate and plot a periodically sampled time series {$$t_j$$} of N points for the sum of two sine
waves at 697 and 1209 Hz, which is the DTMF tone for the number 1 key.
Here's a plot of 250 samples taken over one tenth of a second.
![samples](../assets/img/09_fig_a.png)
### (b)
{:.question}
Calculate and plot the Discrete Cosine Transform (DCT) coefficients {$$f_i$$} for these data,
defined by their multiplication by the matrix $$f_i = \sum_{j = 0}^{N - 1} D_{ij} t_j$$, where
$$
\begin{align*}
D_{ij} =
\begin{cases}
\sqrt{\frac{1}{N}} &(i = 0)\\
\sqrt{\frac{2}{N}} \cos \left( \frac{\pi (2j + 1) i}{2 N} \right) &(1 \leq i \leq N - 1)
\end{cases}
\end{align*}
$$
![dct](../assets/img/09_fig_b.png)
### (c)
{:.question}
Plot the inverse transform of the {$$f_i$$} by multiplying them by the inverse of the DCT matrix
(which is equal to its transpose) and verify that it matches the time series.
The original samples are recovered.
![recovered samples](../assets/img/09_fig_c.png)
### (d)
{:.question}
Randomly sample and plot a subset of M points {$$t^\prime_k$$} of the {$$t_j$$}; you’ll later
investigate the dependence on the sample size.
Here I've selected 100 samples from the original 250. The plot is recognizable but very distorted.
![subset samples](../assets/img/09_fig_d.png)
### (e)
{:.question}
Starting with a random guess for the DCT coefficients {$$f^\prime_i$$}, use gradient descent to
minimize the error at the sample points
$$
\min_{\{f^\prime_i\}} \sum_{k = 0}^{M - 1}
\left( t^\prime_k - \sum_{j = 0}^{N - 1} D_{ij} f^\prime_i \right)^2
$$
{:.question}
and plot the resulting estimated coefficients.
Gradient descent very quickly drives the loss function to zero. However it's not reconstructing the
true DCT coefficients.
![recovered dct (unregularized)](../assets/img/09_fig_e.png)
To make sure I don't have a bug in my code, I plotted the samples we get by performing the inverse
DCT on the estimated coefficients.
![recovered samples (unregularized)](../assets/img/09_fig_e_2.png)
Sure enough all samples in the subset are matched exactly. But the others are way off the mark.
We've added a lot of high frequency content, and are obviously overfitting.
### (f)
{:.question}
The preceding minimization is under-constrained; it becomes well-posed if a norm of the DCT
coefficients is minimized subject to a constraint of agreeing with the sampled points. One of the
simplest (but not best [Gershenfeld, 1999]) ways to do this is by adding a penalty term to the
minimization. Repeat the gradient descent minimization using the L2 norm:
$$
\min_{\{f^\prime_i\}} \sum_{k = 0}^{M - 1}
\left( t^\prime_k - \sum_{j = 0}^{N - 1} D_{ij} f^\prime_i \right)^2
+ \sum_{i = 0}^{N - 1} f^{\prime 2}_i
$$
{:.question}
and plot the resulting estimated coefficients.
With L2 regularization, we remove some of the high frequency content. This makes the real peaks a
little more prominent.
![recovered dct (L2 regularized)](../assets/img/09_fig_f.png)
However it comes at a cost: gradient descent no longer drive the loss to zero. As such the loss
itself isn't a good termination condition. In its place I terminate when the squared norm of the
gradient is less than $$\num{1e-6}$$. The final loss for the coefficients in the plot above is
around 50.
You can easily see that the loss is nonzero from the reconstructed samples.
![recovered samples (L2 regularized)](../assets/img/09_fig_f_2.png)
### (g)
{:.question}
Repeat the gradient descent minimization using the L1 norm:
$$
\min_{\{f^\prime_i\}} \sum_{k = 0}^{M - 1}
\left( t^\prime_k - \sum_{j = 0}^{N - 1} D_{ij} f^\prime_i \right)^2
+ \sum_{i = 0}^{N - 1} \vert f^{\prime}_i \vert
$$
{:.question}
Plot the resulting estimated coefficients, compare to the L2 norm estimate, and compare the
dependence of the results on M to the Nyquist sampling limit of twice the highest frequency.
With L1 regularization the DCT coefficients are recovered pretty well. There is no added high
frequency noise.
![recovered dct (L1 regularized)](../assets/img/09_fig_g.png)
It still can't drive the loss to zero. Additionally it's hard to drive the squared norm of the
gradient to zero, since the gradient of the absolute values shows up as 1 or -1. (Though to help
prevent oscillation I actually drop this contribution if the absolute value of the coefficient in
question is less than $$\num{1e-3}$$.) So here I terminate when the relative change in the loss
falls below $$\num{1e-9}$$. I also decay the learning rate during optimization. It starts at 0.1 and
is multiplied by 0.99 every 64 iterations.
The final loss is around 40, so better than we found with L2 regularization. However it did take
longer to converge: this version stopped after 21,060 iterations, as opposed to 44 (for L2) or 42
(for unregularized). If I stop it after 50 samples it's a bit worse than the L2 version (loss of 55
instead of 50). It's not until roughly 2500 iterations that it's unequivocally pulled ahead. I
played with learning rates and decay schedules a bit, but there might be more room for improvement.
The recovered samples are also much more visually recognizable. The amplitude of our waveform seems
overall a bit diminished, but unlike our previous attempts it looks similar to the original.
![recovered samples (L1 regularized)](../assets/img/09_fig_g_2.png)
This technique can recover the signal substantially below the Nyquist limit. The highest frequency
signal is 1209 Hz, so with traditional techniques we'd have to sample at 2418 Hz or faster to avoid
artifacts. Since I'm only plotting over one hundreth of a second, I thus need at least 242 samples.
So my original 250 is (not coicidentally) near here. But even with a subset of only 50 samples, the
L1 regularized gradient descent does an admirable job at recovering the DCT coefficients and
samples:
![recovered dct (from 50)](../assets/img/09_fig_g_3.png)
![recovered samples (from 50)](../assets/img/09_fig_g_4.png)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment