Formal Algorithms for Transformers
Read:: Print:: β Zotero Link:: NA PDF:: NA Files:: arXiv.org Snapshot; Phuong_Hutter_2022_Formal Algorithms for Transformers.pdf; Twitter Thread Reading Note:: Mary Phuong, Marcus Hutter 2022 Web Rip::
TABLE without id
file.link as "Related Files",
title as "Title",
type as "type"
FROM "" AND -"ZZ. planning"
WHERE citekey = "phuongFormalAlgorithmsTransformers2022"
SORT file.cday DESC
Abstract
This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (not results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs.
Quick Reference
Top Comments
- Started, but not really very far into it. Would make for a good podcast thing with V.
- Just finished a quick read through and itβs dense than I thought, I quickly got lost in the βmathβ.
- He also links to an illustrated guide here The Illustrated Transformer β Jay Alammar β Visualizing machine learning one concept at a time.
- The rest of the website is pretty interesting and subbing to the YT channelJay Alammar β Visualizing machine learning one concept at a time.
Topics
Tasks
β