Suggest edit — Principle Component Analysis

Title

Name

Note

---
title: "Principle Component Analysis"
type: concept
related: [Principle Component Analysis, Eigenvalue, Singular Value Decomposition]
source: https://www.jemoka.com/posts/kbhprinciple_component_analysis/
confidence: high
status: active
---

assumptions of PCA this makes sense only when the data is linear large variations have important structure (i.e. not noise) motivation Consider dataset \(\qty {x^{(i)}}_{i=1}^{n}\) such that \(x^{(i)} \in \mathbb{R}^{d}\), where \(d &lt; n\). Our goal: we want to automatically identify such correlations between axes of data. This boils down to two goals:
automatically detect and remove redudancies find features that explain the variation preprocessing before PCA normalize: \(x_{j}^{(i)} = \frac{x^{(i)}_{j} - \mu_{j}}{\sigma_{j}}\)
derivation Find a hyperplane \(u\) such that if data is projected onto \(u\), the variance of the projected data is maximized. Namely, for unit vector \(u\) representing the hyperplane, we want to find \(u\) such that we ca maximize:
\begin{align} \frac{1}{n} \sum_{i=1}^{n} \qty(x^{(i)}^{T} u)^{2} &amp;= \frac{1}{n} \sum_{i=1}^{n} u^{T} x^{(i)} x^{(i)}^{T} u \\ &amp;= u^{T} \qty(\frac{1}{n} \sum_{i=1}^{n} x^{(i)} x^{(i)}^{T}) u \end{align}
Notice the maximizing choice of \(u\) should be the principle eigenvector of \(\frac{1}{n} \sum_{i=1}^{n} x^{(i)} x^{(i)}^{T}\) (its the vector that does no rotation and just stretch.) To show this for yourself, use the method of Lagrange multipliers.
Once you do this, you can then project:
\begin{equation} x \approx \qty(x^{T}u )u \end{equation}
you can find more than one dimensions by choosing increasingly smaller eigenvectors corresponding to \(u\):
\begin{equation} x \approx \sum_{i=1}^{k} \qty(x^{T}u_{i}) u_{i} \end{equation}
connecting SVD and PCA Recall in PCA we obtain the matrix:
\begin{equation} \qty(\frac{1}{n} \sum_{i=1}^{n} x^{(i)} x^{(i)}^{T}) \end{equation}
for which we need the eigenvalue. Notice, we can write via SVD:
\begin{equation} X = U S V^{T} \end{equation}
Now, the transposes are:
\begin{equation} X^{T} = V S U^{T} \end{equation}
this gives:
\begin{align} X X^{T} &amp;=U S V^{T} V S U^{T} \\ &amp;= U S S U^{T} \\ &amp;= \qty(US) \qty(US)^{T} \end{align}
This gives that \(U\) is the eigenvectors of \(XX^{T}\), this is nice because then we can just use the SVD to give us the PCA vectors.
error We can consider the error of PCA as:
\begin{equation} x^{(i)} - \sum_{j=1}^{k} \qty(x^{(i)}^{T}u_{j}) u_{j} \end{equation}
applications visualization preprocessing to Prudence dimensions reduce data noise (which is getting rid of extra axes some recent work Candes, Li, Ma, Wright: &ldquo;Robust PCA&rdquo; &mdash; minimize \(\mid X - L\mid\) such that \(\text{rank}\qty(L) \leq K\); we find that the right minimizer is \(\mid L\mid_{*} + \lambda \mid S\mid_{1}\), where \(\mid L\mid_{*}\) is the sum of the singular values of \(L\), and \(\mid S\mid_{1}\) is just a rank 1 norm