Tuesday, April 29, 2014

Steepest gradient descent

Steepest gradient descent

Steepest gradient descent is an algorithm for optimizing quadratic forms. The idea is, given a wok-shaped convex surface, if you start at a certain point and each time you move in the direction of fastest descent, then, step by step, you will eventually get to the lowest point.

We start with the definition of quadratic forms:

$f'(x)$ is the gradient vector, i.e. it points to the direction of fastest ascent. We want the opposite, so let $r(x) = -f'(x) = b - Ax$.

You might ask, why not just invert A and solve the equation to get the optimal $x$? In simple cases this is indeed possible, but here we use an iterative approach for it's very useful in more general situations.

Back to steepest descent method. First we choose a random start point $x_{(0)}$, and descend in the direction of $r_{(0)}$, i.e.

\begin{align*} x_{(1)} &= x_{(0)} + \alpha r_{(0)} \\ &= x_{(0)} + \alpha (b - Ax_{(0)}) \\ \end{align*}

Everything is settled, except that we don't know $\alpha$, i.e. we known the step size to take. To maximize efficiency, we want to reach the lowest point possible in every step. This is solved this way:

Note that $\frac{d}{d\alpha}x_{(1)} =\frac{d}{d\alpha}(x_{(0)} + \alpha r_0) = r_0$.

This means the gradient at the next point should be perpendicular to the gradient at the current point! (Hence the zigzag in the first figure.) The calculation of $\alpha$ easily follows:

Now all pieces of the puzzle are here, we can have the big picture:

The computational burden (in terms of matrix multiplication, which is the bulky part) can be reduced by half if you use a small trick:

So, here is the first few iterations of the algorithm:

\begin{align*} \hline \\ r_{(0)} &= b - Ax_{(0)} \\ \alpha_{(0)} &= \frac{r_{(0)}^T r_{(0)}}{r_{(0)}^T A r_{(0)}} \\ x_{(1)} &= x_{(0)} + \alpha_{(0)} r_{(0)} \\ \hline \\ r_{(1)} &= r_{(0)} - \alpha_{0}Ar_{(0)} \\ \alpha_{(1)} &= \frac{r_{(1)}^T r_{(1)}}{r_{(1)}^T A r_{(1)}} \\ x_{(2)} &= x_{(1)} + \alpha_{(1)} r_{(1)} \\ \hline \\ r_{(2)} &= r_{(1)} - \alpha_{1}Ar_{(1)} \\ \alpha_{(2)} &= \frac{r_{(2)}^T r_{(2)}}{r_{(2)}^T A r_{(2)}} \\ x_{(3)} &= x_{(2)} + \alpha_{(2)} r_{(2)} \\ \hline \\ \vdots \end{align*}

The order of calculation and reuse of variables is illustrated here: