Principal Component Analysis

Header image for the post titled Principal Component Analysis

The “curse of dimensionality” refers to various challenges that arise as the number of dimensions in a dataset increases, such as exponential growth in data space, sparsity of data, loss of meaningful distance metrics, increased computational complexity, and higher risk of overfitting. Principal Component Analysis (PCA) is a powerful tool to combat these challenges by reducing the dimensionality of data while retaining most of the original variance. This note explores PCA in depth, including its mathematical foundation, applications, and practical considerations.


Header image for the post titled Covariance

Covariance is a statistical measure that quantifies the degree to which two variables change together. It’s a key measure used to understand the linear relationship between variables.

Natural Language Processing: A Primer

Header image for the post titled Natural Language Processing: A Primer

I’ve been fascinated by the possibility of extracting knowledge from large bodies of text using computational methods since… well, since I’ve started reading scientific literature. Natural Language Processing (NLP) is a branch of machine learning that focuses on the interaction between computers and humans through natural language. The ultimate objective of NLP is to enable computers to understand, interpret, and generate human language in a way that is both meaningful and useful. This note will give an overview of the basic NLP concepts and methods, and will give practical examples using Python.