## VC dimension of perceptron

The perceptron is a simple classification algorithm which separates data points by a hyperplane. With 2D data, the hyperplane is just a line:

Mathematically, \begin{align*} \hat{y_i} &= \sum_{j=0}^n w_j x_{ij} \end{align*}

Or in matrix form: $Y = Xw$.

where $x_{0j}$ is constantly 1, so that the hyperplane does not have to pass the origin.

VC dimension is the maximum number of data points for which all possible hypotheses can be formed. i.e. $M = 2^N$. (see this post)

Here we argue that for the perceptron algorithm, the VC dimension is $d+1$, where $d$ is the number of dimensions of the data. i.e. For $d$-dimension data, we can have at most $d+1$ data points and still be able to generate all hypotheses.

The reason is that when the number of data points is $d+1$, then we can find a invertible data matrix X as below, and whatever the response matrix Y is, we can solve for w: $$w = X^{-1}Y$$

But if we increase the number of data points just by 1, then X cannot be invertible, so $d+2$ is a breakpoint.