karpathy / nano-llama31
nanoGPT style version of Llama 3.1
☆1,290Updated 5 months ago
Alternatives and similar repositories for nano-llama31:
Users that are interested in nano-llama31 are comparing it to the libraries listed below
- NanoGPT (124M) in 3.4 minutes☆2,068Updated last week
- The Multilayer Perceptron Language Model☆532Updated 5 months ago
- The n-gram Language Model☆1,363Updated 5 months ago
- Recipes to scale inference-time compute of open models☆932Updated this week
- The Autograd Engine☆550Updated 4 months ago
- Code for BLT research paper☆1,314Updated this week
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆831Updated last month
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,217Updated last month
- The Tensor (or Array)☆418Updated 5 months ago
- A PyTorch native library for large model training☆3,091Updated this week
- System 2 Reasoning Link Collection☆722Updated this week
- Large Concept Models: Language modeling in a sentence representation space☆1,713Updated this week
- Everything about the SmolLM & SmolLM2 family of models☆1,554Updated last week
- DataComp for Language Models☆1,206Updated last month
- Minimalistic 4D-parallelism distributed training framework for education purpose☆644Updated this week
- MINT-1T: A one trillion token multimodal interleaved dataset.☆788Updated 5 months ago
- ☆996Updated last month
- Video+code lecture on building nanoGPT from scratch☆3,782Updated 5 months ago
- Minimalistic large language model 3D-parallelism training☆1,386Updated this week
- Tile primitives for speedy kernels☆1,923Updated this week
- Puzzles for learning Triton☆1,300Updated last month
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,462Updated this week
- Bringing BERT into modernity via both architecture changes and scaling☆1,045Updated last week
- Reaching LLaMA2 Performance with 0.1M Dollars☆965Updated 5 months ago
- UNet diffusion model in pure CUDA☆596Updated 6 months ago
- Open weights language model from Google DeepMind, based on Griffin.☆614Updated 6 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆752Updated this week
- An Open Large Reasoning Model for Real-World Solutions☆1,378Updated last month
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,908Updated 5 months ago
- Training LLMs with QLoRA + FSDP☆1,436Updated 2 months ago