### Answer the compressed sensing questions

parent 9640c329
_psets/09.md 0 → 100644
 --- title: Problem Set 9 (Transformations) --- I went rogue this week and did a deep dive on compressed sensing. So here's the compressed sensing problem, followed by an exploration of compressed sensing CT reconstruction. ## 2 My code for all sections of this problem is [here](https://gitlab.cba.mit.edu/erik/compressed_sensing). All the sampling and gradient descent is done in C++ using [Eigen](http://eigen.tuxfamily.org) for vector and matrix operations. I use Python and [matplotlib](https://matplotlib.org/) to generate the plots. ### (a) {:.question} Generate and plot a periodically sampled time series {\$\$t_j\$\$} of N points for the sum of two sine waves at 697 and 1209 Hz, which is the DTMF tone for the number 1 key. Here's a plot of 250 samples taken over one tenth of a second. ![samples](../assets/img/09_fig_a.png) ### (b) {:.question} Calculate and plot the Discrete Cosine Transform (DCT) coefficients {\$\$f_i\$\$} for these data, defined by their multiplication by the matrix \$\$f_i = \sum_{j = 0}^{N - 1} D_{ij} t_j\$\$, where \$\$ \begin{align*} D_{ij} = \begin{cases} \sqrt{\frac{1}{N}} &(i = 0)\\ \sqrt{\frac{2}{N}} \cos \left( \frac{\pi (2j + 1) i}{2 N} \right) &(1 \leq i \leq N - 1) \end{cases} \end{align*} \$\$ ![dct](../assets/img/09_fig_b.png) ### (c) {:.question} Plot the inverse transform of the {\$\$f_i\$\$} by multiplying them by the inverse of the DCT matrix (which is equal to its transpose) and verify that it matches the time series. The original samples are recovered. ![recovered samples](../assets/img/09_fig_c.png) ### (d) {:.question} Randomly sample and plot a subset of M points {\$\$t^\prime_k\$\$} of the {\$\$t_j\$\$}; you’ll later investigate the dependence on the sample size. Here I've selected 100 samples from the original 250. The plot is recognizable but very distorted. ![subset samples](../assets/img/09_fig_d.png) ### (e) {:.question} Starting with a random guess for the DCT coefficients {\$\$f^\prime_i\$\$}, use gradient descent to minimize the error at the sample points \$\$ \min_{\{f^\prime_i\}} \sum_{k = 0}^{M - 1} \left( t^\prime_k - \sum_{j = 0}^{N - 1} D_{ij} f^\prime_i \right)^2 \$\$ {:.question} and plot the resulting estimated coefficients. Gradient descent very quickly drives the loss function to zero. However it's not reconstructing the true DCT coefficients. ![recovered dct (unregularized)](../assets/img/09_fig_e.png) To make sure I don't have a bug in my code, I plotted the samples we get by performing the inverse DCT on the estimated coefficients. ![recovered samples (unregularized)](../assets/img/09_fig_e_2.png) Sure enough all samples in the subset are matched exactly. But the others are way off the mark. We've added a lot of high frequency content, and are obviously overfitting. ### (f) {:.question} The preceding minimization is under-constrained; it becomes well-posed if a norm of the DCT coefficients is minimized subject to a constraint of agreeing with the sampled points. One of the simplest (but not best [Gershenfeld, 1999]) ways to do this is by adding a penalty term to the minimization. Repeat the gradient descent minimization using the L2 norm: \$\$ \min_{\{f^\prime_i\}} \sum_{k = 0}^{M - 1} \left( t^\prime_k - \sum_{j = 0}^{N - 1} D_{ij} f^\prime_i \right)^2 + \sum_{i = 0}^{N - 1} f^{\prime 2}_i \$\$ {:.question} and plot the resulting estimated coefficients. With L2 regularization, we remove some of the high frequency content. This makes the real peaks a little more prominent. ![recovered dct (L2 regularized)](../assets/img/09_fig_f.png) However it comes at a cost: gradient descent no longer drive the loss to zero. As such the loss itself isn't a good termination condition. In its place I terminate when the squared norm of the gradient is less than \$\$\num{1e-6}\$\$. The final loss for the coefficients in the plot above is around 50. You can easily see that the loss is nonzero from the reconstructed samples. ![recovered samples (L2 regularized)](../assets/img/09_fig_f_2.png) ### (g) {:.question} Repeat the gradient descent minimization using the L1 norm: \$\$ \min_{\{f^\prime_i\}} \sum_{k = 0}^{M - 1} \left( t^\prime_k - \sum_{j = 0}^{N - 1} D_{ij} f^\prime_i \right)^2 + \sum_{i = 0}^{N - 1} \vert f^{\prime}_i \vert \$\$ {:.question} Plot the resulting estimated coefficients, compare to the L2 norm estimate, and compare the dependence of the results on M to the Nyquist sampling limit of twice the highest frequency. With L1 regularization the DCT coefficients are recovered pretty well. There is no added high frequency noise. ![recovered dct (L1 regularized)](../assets/img/09_fig_g.png) It still can't drive the loss to zero. Additionally it's hard to drive the squared norm of the gradient to zero, since the gradient of the absolute values shows up as 1 or -1. (Though to help prevent oscillation I actually drop this contribution if the absolute value of the coefficient in question is less than \$\$\num{1e-3}\$\$.) So here I terminate when the relative change in the loss falls below \$\$\num{1e-9}\$\$. I also decay the learning rate during optimization. It starts at 0.1 and is multiplied by 0.99 every 64 iterations. The final loss is around 40, so better than we found with L2 regularization. However it did take longer to converge: this version stopped after 21,060 iterations, as opposed to 44 (for L2) or 42 (for unregularized). If I stop it after 50 samples it's a bit worse than the L2 version (loss of 55 instead of 50). It's not until roughly 2500 iterations that it's unequivocally pulled ahead. I played with learning rates and decay schedules a bit, but there might be more room for improvement. The recovered samples are also much more visually recognizable. The amplitude of our waveform seems overall a bit diminished, but unlike our previous attempts it looks similar to the original. ![recovered samples (L1 regularized)](../assets/img/09_fig_g_2.png) This technique can recover the signal substantially below the Nyquist limit. The highest frequency signal is 1209 Hz, so with traditional techniques we'd have to sample at 2418 Hz or faster to avoid artifacts. Since I'm only plotting over one hundreth of a second, I thus need at least 242 samples. So my original 250 is (not coicidentally) near here. But even with a subset of only 50 samples, the L1 regularized gradient descent does an admirable job at recovering the DCT coefficients and samples: ![recovered dct (from 50)](../assets/img/09_fig_g_3.png) ![recovered samples (from 50)](../assets/img/09_fig_g_4.png)

43.4 KB

21.8 KB

44 KB

41.8 KB

50.1 KB

65.5 KB

45.8 KB

42.6 KB

20.1 KB

47.3 KB

18.9 KB

45.6 KB

Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment