Principal Component Analysis

The “curse of dimensionality” refers to various challenges that arise as the number of dimensions in a dataset increases, such as exponential growth in data space, sparsity of data, loss of meaningful distance metrics, increased computational complexity, and higher risk of overfitting. Principal Component Analysis (PCA) is a powerful tool to combat these challenges by reducing the dimensionality of data while retaining most of the original variance. This note explores PCA in depth, including its mathematical foundation, applications, and practical considerations.

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction while preserving as much variance as possible. It transforms the original variables into a new set of variables through linear combinations. These new variables, called principal components, are orthogonal to each other and ranked according to the variance of data along them.

Computing PCA

The first step in the PCA algorithm is achieved by computing the covariance matrix. It is crucial to standardize the values in this covariance matrix because PCA is sensitive to the variances of the initial variables. The core of PCA then lies in the eigenvalue decomposition of the covariance matrix. This decomposition allows us to extract the eigenvalues and the corresponding eigenvectors. The eigenvectors represent the directions of maximum variance and form the new axes, while the eigenvalues indicate the amount of variance captured by each principal component.

Given the covariance matrix $ C $, we find its eigenvalues $ \lambda $ and eigenvectors $ v $. These satisfy the equation:

$$ C v = \lambda v $$

For a refresher on linear algebra, see here.

Eigenvectors $ v $: These are the directions of the axes of the new feature space. In PCA, the eigenvectors are used to transform the data into a new coordinate system. They are orthogonal to each other in the context of PCA, aligning with the directions of maximum variance.
Eigenvalues $ \lambda $: These represent the amount of variance that is accounted for by each of the new axes. A higher eigenvalue corresponds to a direction with more variance.

The eigenvectors are sorted by their corresponding eigenvalues in descending order. This order determines the importance of the eigenvectors. Principal components are selected based on the highest eigenvalues, which means choosing the directions where the data varies the most.

The original data can then be projected onto the space spanned by the selected eigenvectors. This transformation is achieved by multiplying the original dataset $ X $ by the matrix of selected eigenvectors $ V $:

$$ Y = X V $$

Here, $ Y $ represents the data transformed into the new coordinate system defined by the principal components.

The number of principal components selected depends on the amount of variance one wishes to retain. Typically, components are ranked by the eigenvalues in descending order, and a subset is chosen such that a significant proportion of the total variance is covered.

The final step involves projecting the original data onto the new axes (principal components). This is done by multiplying the original data matrix with the eigenvectors corresponding to the selected principal components.

A few practical considerations:

Scale Sensitivity: Since PCA is sensitive to the variances of the initial variables, it is crucial to scale the data appropriately before applying PCA.
Assumption of Linearity: PCA assumes that the principal components are a linear combination of the original features, which might not hold in all scenarios.

Principal Component Analysis is a robust, versatile technique for data reduction, feature extraction, and data visualization. Understanding its mathematical foundations and practical implications can greatly enhance its effectiveness in various machine learning and data analysis tasks. By judiciously using PCA, one can unearth subtle patterns in data that would otherwise be obscured by the curse of dimensionality.