These are my notes that cover the mathematical foundations of a dimensionality reduction method called Principal Component Analysis (PCA), taken while attending CSCI-UA 9473 - Foundations of Machine Learning at NYU Paris. They make use of linear algebra and statistics to formalize the concept of PCA.
Multivariate Statistics & Notation
Let be a random vector. We will use the superscript notation to denote the components of .
The expectation of is defined as:
Similarly, the covariance matrix of , denoted by , is a matrix defined such that:
We can write the whole covariance matrix in the following vectorized form:
Note: This is because