gmontamat / poor-mans-transformers
Implement Transformers (and Deep Learning) from scratch in NumPy
☆25Updated last year
Alternatives and similar repositories for poor-mans-transformers:
Users that are interested in poor-mans-transformers are comparing it to the libraries listed below
- A numpy implementation of the Transformer model in "Attention is All You Need"☆53Updated 6 months ago
- Tutorial for how to build BERT from scratch☆86Updated 8 months ago
- Reference implementation of Mistral AI 7B v0.1 model.☆28Updated last year
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆140Updated 7 months ago
- Well documented, unit tested, type checked and formatted implementation of a vanilla transformer - for educational purposes.☆233Updated 9 months ago
- ☆140Updated 11 months ago
- Notes about "Attention is all you need" video (https://www.youtube.com/watch?v=bCz4OMemCcA)☆239Updated last year
- Annotated version of the Mamba paper☆471Updated 11 months ago
- ☆16Updated 3 weeks ago
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆50Updated 9 months ago
- Pytorch implementation of the xLSTM model by Beck et al. (2024)☆152Updated 5 months ago
- ☆68Updated 10 months ago
- LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch☆95Updated last year
- LLaMA 2 implemented from scratch in PyTorch☆287Updated last year
- Highly commented implementations of Transformers in PyTorch☆132Updated last year
- I will build Transformer from scratch☆53Updated 8 months ago
- If tinygrad wasn't small enough for you...☆677Updated 10 months ago
- A repository for log-time feedforward networks☆218Updated 9 months ago
- Tutorials on tinygrad☆314Updated this week
- my attempts at implementing various bits of Sepp Hochreiter's new xLSTM architecture☆129Updated 8 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆178Updated 4 months ago
- ☆78Updated 10 months ago
- ☆110Updated 3 weeks ago
- ML/DL Math and Method notes☆58Updated last year
- Prune transformer layers☆67Updated 8 months ago
- The Tensor (or Array)☆420Updated 5 months ago
- Distributed training (multi-node) of a Transformer model☆50Updated 9 months ago
- The purpose of this repo is to make it easy to get started with JAX, Flax, and Haiku. It contains my "Machine Learning with JAX" series o…☆680Updated last year
- Basic implementation of BERT and Transformer in Pytorch in one short python file (also includes "predict next word" GPT task)☆41Updated last year
- Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.☆360Updated 8 months ago