Curriculum, personal interests, and reference material.
- Category Theory - Bartosz Milewski
- Category Theory 2 - Bartosz Milewski
- Category Theory 3 - Bartosz Milewski
- Applied Category Theory - David Spivak / Brendan Fong, MIT 18.S097
- Categorical Databases talk, David Spivak
- Category theory is a universal modeling language
- Probabilistic System Analys
- MIT 6.262 Discrete Stochastic Processes, Spring 20
- MIT 18.650 Statistics for Applications, Fall 20
- MIT 6.262 Discrete Stochastic Processes, Spring 20
- Artificial Intelligence - Patrick Winston, MIT 6.034
- CS480/680 Machine Learning - University of Waterloo
- Machine Learning, Andrew Ng - Stanford
- Machine Learning, Andrew Ng
- Deep Learning 2015, Nando Freitas - Oxford
- CORNELL CS4780 Machine Learning for Intelligent Systems
- Andrej Karpathy's - The spelled-out intro to neural networks and backpropagation series
- Introduction to Data-Centric AI - MIT IAP 2023
- Natural Language Processing - Dan Jurafsky / Chris Manning - Broken link :(
- Stanford CS224N: NLP with Deep Learning | Winter 2019
- Deep Learning for NLP at Oxford with Deep Mind 2017
- Intro to Reinforcement Learning - David Silver
- Advanced Deep Learning & Reinforcement Learning
- CS885 Reinforcement Learning - University of Waterloo
- Neural Networks and the Chomsky Hierarchy (A comparative study on model generalization)
- Liquid Time-constant Networks
- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- Event-Based Backpropagation can compute Exact Gradients for Spiking Neural Networks
- The Forward-Forward Algorithm: Some Preliminary Investigations
- The Predictive Forward-Forward Algorithm
- Knowledge is a Region in Weight Space for Fine-tuned Language Models
- Beyond neural scaling laws: beating power law scaling via data pruning
- LoRA Learns Less and Forgets Less
- Attention Is All You Need
- Google’s Neural Machine Translation System: Bridging the Gapbetween Human and Machine Translation
- Rethinking Search: Making Experts out of Dilettantes
- Toolformer: Language Models Can Teach Themselves to Use Tools
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
- Reformer: The Efficient Transformer
- Semantic Tokenizer for Enhanced Natural Language Processing
- Unlimiformer: Long-Range Transformers with Unlimited Length Input
- The Power of Scale for Parameter-Efficient Prompt Tuning
- LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models
- Prompting Is Programming: A Query Language for Large Language Models (LMQL)
- Fine-Tuning Language Models with Just Forward Passes - code
- Orca: Progressive Learning from Complex Explanation Traces of GPT-4
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- MemGPT: Towards LLMs as Operating Systems
- The Curious Case of Neural Text Degeneration
- LoRA: Low-Rank Adaptation of Large Language Models
- Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
- RWKV: Reinventing RNNs for the Transformer Era
- Think before you speak: Training Language Models With Pause Tokens
- The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
- Chameleon: Mixed-Modal Early-Fusion Foundation Models