liyuan24 / deepseek_from_scratchLinks
☆14Updated 2 months ago
Alternatives and similar repositories for deepseek_from_scratch
Users that are interested in deepseek_from_scratch are comparing it to the libraries listed below
Sorting:
- ☆174Updated 5 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆185Updated 3 weeks ago
- GPU Kernels☆182Updated last month
- ☆39Updated last month
- 100 days of building GPU kernels!☆445Updated last month
- Notes and commented code for RLHF (PPO)☆96Updated last year
- Distributed training (multi-node) of a Transformer model☆72Updated last year
- ☆343Updated 2 months ago
- ☆193Updated 4 months ago
- ☆298Updated 6 months ago
- ☆25Updated 8 months ago
- An extension of the nanoGPT repository for training small MOE models.☆152Updated 3 months ago
- Notes about LLaMA 2 model☆61Updated last year
- making the official triton tutorials actually comprehensible☆41Updated 3 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆189Updated last month
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆311Updated last month
- PyTorch implementations of algorithms from "Reinforcement Learning: An Introduction by Sutton and Barto", along with various RL research …☆146Updated this week
- Building blocks for foundation models.☆511Updated last year
- ☆89Updated 9 months ago
- Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/☆55Updated 2 months ago
- A repository consisting of paper/architecture replications of classic/SOTA AI/ML papers in pytorch☆281Updated last week
- minimal GRPO implementation from scratch☆90Updated 3 months ago
- BERT explained from scratch☆14Updated last year
- Slides, notes, and materials for the workshop☆326Updated last year
- "LLM from Zero to Hero: An End-to-End Large Language Model Journey from Data to Application!"☆30Updated this week
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆368Updated 3 months ago
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆47Updated last year
- Notes on Direct Preference Optimization☆19Updated last year
- repo of paper implementations☆20Updated 4 months ago
- LLaMA 2 implemented from scratch in PyTorch☆335Updated last year