Attention is all you need implementation
☆1,179Jun 8, 2024Updated last year
Alternatives and similar repositories for pytorch-transformer
Users that are interested in pytorch-transformer are comparing it to the libraries listed below
Sorting:
- LLaMA 2 implemented from scratch in PyTorch☆365Sep 25, 2023Updated 2 years ago
- Notes about "Attention is all you need" video (https://www.youtube.com/watch?v=bCz4OMemCcA)☆342May 28, 2023Updated 2 years ago
- Stable Diffusion implemented from scratch in PyTorch☆1,030Oct 22, 2024Updated last year
- LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch☆126Jul 24, 2023Updated 2 years ago
- Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation: https://www.youtube.com/watch?v=vAmKB7iPkWw☆593Dec 6, 2024Updated last year
- Distributed training (multi-node) of a Transformer model☆94Apr 10, 2024Updated last year
- ML algorithms implementations that are good for learning the underlying principles☆27Dec 7, 2024Updated last year
- BERT explained from scratch☆16Oct 26, 2023Updated 2 years ago
- ☆239Jan 2, 2025Updated last year
- Notes and commented code for RLHF (PPO)☆126Feb 27, 2024Updated 2 years ago
- Transformer: PyTorch Implementation of "Attention Is All You Need"☆4,450Jul 15, 2025Updated 7 months ago
- Notes on quantization in neural networks☆120Dec 14, 2023Updated 2 years ago
- Notes on the Mistral AI model☆20Dec 27, 2023Updated 2 years ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- Video+code lecture on building nanoGPT from scratch☆4,776Aug 13, 2024Updated last year
- Simple transformer tts☆57Jul 18, 2025Updated 7 months ago
- Implement a ChatGPT-like LLM in PyTorch from scratch, step by step☆87,151Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆54,071Nov 12, 2025Updated 3 months ago
- A resource for learning about Machine learning & Deep Learning☆8,403Aug 17, 2024Updated last year
- Materials for the Learn PyTorch for Deep Learning: Zero to Mastery course.☆17,281Feb 11, 2026Updated 3 weeks ago
- Implementation of the paper "Denoising Diffusion Probabilistic Models" in PyTorch☆67Jul 4, 2023Updated 2 years ago
- ☆18Jan 3, 2025Updated last year
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆599Oct 7, 2025Updated 5 months ago
- A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API☆14,842Aug 8, 2024Updated last year
- a minimal cache manager for PagedAttention, on top of llama3.☆136Aug 26, 2024Updated last year
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- ☆22Aug 14, 2024Updated last year
- Fast and memory-efficient exact attention☆22,460Updated this week
- Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.☆76,159Feb 5, 2026Updated last month
- ☆4,544Jan 31, 2024Updated 2 years ago
- From scratch implementation of a vision language model in pure PyTorch☆253May 6, 2024Updated last year
- ☆18Feb 7, 2021Updated 5 years ago
- Trained a 114 million Parameter LLM from Scratch.☆19Jul 21, 2024Updated last year
- Material for gpu-mode lectures☆5,800Feb 1, 2026Updated last month
- This repository contains demos I made with the Transformers library by HuggingFace.☆11,511Updated this week
- Official code repo for the O'Reilly Book - "Hands-On Large Language Models"☆23,193Dec 17, 2025Updated 2 months ago
- GPU Kernels☆221Apr 27, 2025Updated 10 months ago
- 100 days of building GPU kernels!☆575Apr 27, 2025Updated 10 months ago
- LLM101n: Let's build a Storyteller☆36,432Aug 1, 2024Updated last year