karpathy / nano-llama31
nanoGPT style version of Llama 3.1
☆1,316Updated 6 months ago
Alternatives and similar repositories for nano-llama31:
Users that are interested in nano-llama31 are comparing it to the libraries listed below
- NanoGPT (124M) in 3 minutes☆2,294Updated this week
- The n-gram Language Model☆1,386Updated 6 months ago
- The Multilayer Perceptron Language Model☆538Updated 6 months ago
- A PyTorch native library for large model training☆3,326Updated this week
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆841Updated this week
- The Autograd Engine☆573Updated 5 months ago
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,250Updated this week
- Minimalistic 4D-parallelism distributed training framework for education purpose☆724Updated this week
- Code for BLT research paper☆1,400Updated this week
- Minimalistic large language model 3D-parallelism training☆1,483Updated this week
- The Tensor (or Array)☆423Updated 6 months ago
- Official repository for our work on micro-budget training of large-scale diffusion models.☆1,246Updated last month
- PyTorch native post-training library☆4,856Updated this week
- UNet diffusion model in pure CUDA☆599Updated 7 months ago
- Recipes to scale inference-time compute of open models☆1,002Updated last month
- llama3.np is a pure NumPy implementation for Llama 3 model.☆973Updated 8 months ago
- Puzzles for learning Triton☆1,403Updated 3 months ago
- Bringing BERT into modernity via both architecture changes and scaling☆1,199Updated last week
- System 2 Reasoning Link Collection☆794Updated 2 weeks ago
- Training Large Language Model to Reason in a Continuous Latent Space☆877Updated 3 weeks ago
- Everything about the SmolLM2 and SmolVLM family of models☆1,888Updated 2 weeks ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,160Updated this week
- DataComp for Language Models☆1,230Updated 2 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆772Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,448Updated this week
- Video+code lecture on building nanoGPT from scratch☆3,883Updated 6 months ago
- What would you do with 1000 H100s...☆1,001Updated last year
- A bibliography and survey of the papers surrounding o1☆1,155Updated 3 months ago