karpathy / nano-llama31
nanoGPT style version of Llama 3.1
☆1,357Updated 8 months ago
Alternatives and similar repositories for nano-llama31:
Users that are interested in nano-llama31 are comparing it to the libraries listed below
- NanoGPT (124M) in 3 minutes☆2,501Updated this week
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,184Updated last week
- The Multilayer Perceptron Language Model☆543Updated 8 months ago
- The Autograd Engine☆600Updated 7 months ago
- The n-gram Language Model☆1,416Updated 8 months ago
- Minimalistic large language model 3D-parallelism training☆1,808Updated this week
- Code for BLT research paper☆1,532Updated last week
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆865Updated 2 months ago
- The Tensor (or Array)☆429Updated 8 months ago
- DataComp for Language Models☆1,283Updated last month
- Video+code lecture on building nanoGPT from scratch☆4,065Updated 8 months ago
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,287Updated this week
- A PyTorch native library for large-scale model training☆3,627Updated this week
- UNet diffusion model in pure CUDA☆602Updated 9 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,008Updated last month
- Recipes to scale inference-time compute of open models☆1,058Updated 2 months ago
- PyTorch native post-training library☆5,123Updated this week
- Puzzles for learning Triton☆1,591Updated 5 months ago
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,378Updated last year
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,462Updated this week
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,514Updated last year
- Muon is Scalable for LLM Training☆1,029Updated last month
- Training Large Language Model to Reason in a Continuous Latent Space☆1,076Updated 3 months ago
- A bibliography and survey of the papers surrounding o1☆1,190Updated 5 months ago
- ☆4,076Updated 10 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆882Updated last week
- llama3.np is a pure NumPy implementation for Llama 3 model.☆981Updated 10 months ago
- Official repository for our work on micro-budget training of large-scale diffusion models.☆1,393Updated 3 months ago
- Open weights language model from Google DeepMind, based on Griffin.☆635Updated 2 months ago
- Large Concept Models: Language modeling in a sentence representation space☆2,104Updated 2 months ago