The perceptron is a simple classification algorithm which separates data points by a hyperplane. With 2D data, the hyperplane is just a line:
Mathematically, \begin{align*} \hat{y_i} &= \sum_{j=0}^n w_j x_{ij} \end{align*}
Or in matrix form: $Y = Xw$.
where $x_{0j}$ is constantly 1, so that the hyperplane does not have to pass the origin.VC dimension is the maximum number of data points for which all possible hypotheses can be formed. i.e. $M = 2^N$. (see this post)
Here we argue that for the perceptron algorithm, the VC dimension is $d+1$, where $d$ is the number of dimensions of the data. i.e. For $d$-dimension data, we can have at most $d+1$ data points and still be able to generate all hypotheses.
The reason is that when the number of data points is $d+1$, then we can find a invertible data matrix X as below, and whatever the response matrix Y is, we can solve for w: $$w = X^{-1}Y $$
But if we increase the number of data points just by 1, then X cannot be invertible, so $d+2$ is a breakpoint.
0 comments:
Post a Comment