Artificial Neural Networks: Transformers
The Transformer architecture has revolutionized Artificial Neural Networks (ANN), demonstrating unparalleled performance in natural language processing, speech recognition, image processing, and other areas that require efficient handling of sequence data. This architecture is the main catalyst behind the current boom in generative AI, which made models like GPT and Stable Diffusion mainstream names. In this note we will take a deep look into the original implementation of the transformer model, and will then expand it to some of the modern variants.