Makemore

Building Karpathy's Makemore - 20th Summer Project by Han 2024

This is my note on learning Makemore and implementing it step by step

Makemore originally created by Karpathy, et al.

Written in Indonesian+English

Model in action can be seen in this Youtube link

What I learned

Torch syntax (I found out some is similar to Numpy), indexing, matrix (tensor) multiply, and documentation, also process on interpreting tensor size
Bigram language model with creating dict of char and look up table
Sampling using torch multinomial
One hot encoding
How torch generator seed works
Average Negative log likelihood (dari log likelihood sampe ketemu avg)
Model smoothing Bigram
Making simple linear bigram neural network and gradient descent using Torch
Softmax activation Function
Building 3 word and 1 label (next word) dataset
Embedding matrix for the characters
Building neural network from scratch using torch.tensor
Reducing dimension of tensor from example, [228146, 3, 2] to [228146, 6] via view() method
Average Negative log likelihood + cross entropy loss
Cross entropy function is better in numerical stability: Sebenernya karena ini probabilitas outputnya (udah normalized) jadi kita kalau modif modif angkanya akan sama Kita bisa coba modif angka angka di logits kasih +10 dan lihat hasilnya akan probabilitas yang sama Jadi torch bakal otomatis kalkulasi nilai max yang ada di logits itu dan kurangin logits berdasarkan max itu
Splitting dataset (Train, val, test)
Methods to make calculation in GPU
Finding good learning rate
Training using mini batch approach
Vanishing gradient problem, analyzing the weakness of Tanh which is squashing function and how to fix it with He/kaiming Init or adding batch normalization layer
He/Kaiming init weakness on more deep neural net
Little bit of dying relu and dead neuron
Batch Normalization Layer in depth and 2 approach on estimating mean and standard deviation for inference on the batchnorm layer
Torch.NN modules
Encoder - Decoder
Transformer
Key, Query, Value vector
Word Embedding + Positional Embedding
Residual connection
Self-Attention(Implemented) and Cross-Attention(Only Defintion)
Multihead Attention
Masking (Preventing current token communicating to future token, decoder structure)
Layer Normalization

Noted and Created by Han Summer 2024

Part of The 20th Summer Project

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
ModelPuisi		ModelPuisi
MyNotes		MyNotes
Result		Result
__pycache__		__pycache__
.gitattributes		.gitattributes
GPTMakemoreModel.py		GPTMakemoreModel.py
Makemore_codedbyhan.ipynb		Makemore_codedbyhan.ipynb
README.md		README.md
TorchBasicMakemore.py		TorchBasicMakemore.py
modelTinyShakeSpeare.pth		modelTinyShakeSpeare.pth
names.txt		names.txt
puisi.txt		puisi.txt
tinyshakespeare.txt		tinyshakespeare.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Makemore

Model in action can be seen in this Youtube link

What I learned

About

Releases

Packages

Languages

HanCreation/makemore

Folders and files

Latest commit

History

Repository files navigation

Makemore

Model in action can be seen in this Youtube link

What I learned

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages