YuvrajSingh-mist / SmolLlamaLinks
So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset form HuggingFace consisting of 15 M texts (10BT snapshot) for a total of full 3 epochs
☆16Updated 10 months ago
Alternatives and similar repositories for SmolLlama
Users that are interested in SmolLlama are comparing it to the libraries listed below
Sorting:
- ☆89Updated last week
- chrome & firefox extension to chat with webpages: local llms☆131Updated last year
- A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse range of tasks☆38Updated last year
- ☆159Updated 9 months ago
- Solving data for LLMs - Create quality synthetic datasets!☆151Updated last year
- Simple examples using Argilla tools to build AI☆57Updated last year
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆457Updated last year
- ☆75Updated last year
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆61Updated last year
- ☆137Updated last year
- rl from zero pretrain, can it be done? yes.☆286Updated 4 months ago
- One click templates for inferencing Language Models☆227Updated 2 months ago
- ☆46Updated 10 months ago
- Train LLM on Hugging Face infra☆67Updated 2 months ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆232Updated last year
- Easy to use, High Performant Knowledge Distillation for LLMs☆97Updated 8 months ago
- Video+code lecture on building nanoGPT from scratch☆68Updated last year
- A repository consisting of paper/architecture replications of classic/SOTA AI/ML papers in pytorch☆402Updated 2 months ago
- Learn the building blocks of how to build gpt-oss from scratch☆112Updated 4 months ago
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆85Updated 5 months ago
- An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning☆37Updated 8 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆110Updated 10 months ago
- ☆87Updated last year
- Various installation guides for Large Language Models☆77Updated 9 months ago
- Fast parallel LLM inference for MLX☆245Updated last year
- A compact LLM pretrained in 9 days by using high quality data☆340Updated 9 months ago
- Entropy Based Sampling and Parallel CoT Decoding☆17Updated last year
- Following Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆172Updated last year
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs☆314Updated 6 months ago
- Finetune Llama-3-8b on the MathInstruct dataset☆115Updated last year