YuvrajSingh-mist / SmolLlamaLinks

So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset form HuggingFace consisting of 15 M texts (10BT snapshot) for a total of full 3 epochs

☆15

Alternatives and similar repositories for SmolLlama

Users that are interested in SmolLlama are comparing it to the libraries listed below

Sorting:

ALucek / GRPO-Training
An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning
☆34Updated last month
tokenbender / avataRL
rl from zero pretrain, can it be done? we'll see.
☆65Updated 3 weeks ago
brendanhogan / picoDeepResearch
☆64Updated last month
kmohan321 / Research_Papers
☆46Updated 3 months ago
cognitivecomputations / grokadamw
☆134Updated 10 months ago
Vaibhavs10 / gpu-poor-llm-notebooks
☆74Updated 9 months ago
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆53Updated 7 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆54Updated 5 months ago
menloresearch / ReZero
☆156Updated 2 months ago
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 5 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆101Updated 4 months ago
HarleyCoops / StoneyNakoda
A locally trained model of Stoney Nakoda has been developed and released. You can access the working model here or train your own instanc…
☆10Updated 3 months ago
doomslide / hyperobject
Plotting (entropy, varentropy) for small LMs
☆97Updated last month
AK391 / dailypapersHN
☆86Updated 9 months ago
axolotl-ai-cloud / grpo_code
A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.
☆32Updated 3 months ago
nivibilla / build-nanogpt
Video+code lecture on building nanoGPT from scratch
☆69Updated last year
agokrani / distillKitPlus
Easy to use, High Performant Knowledge Distillation for LLMs
☆88Updated 2 months ago
smolorg / smoltropix
MLX port for xjdr's entropix sampler (mimics jax implementation)
☆64Updated 8 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆96Updated 4 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆71Updated 3 months ago
kabir2505 / tiny-mixtral
☆42Updated 2 months ago
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆42Updated 2 months ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 5 months ago
phunterlau / paper_without_code
LLM reads a paper and produce a working prototype
☆58Updated 3 months ago
axeld5 / pali_reason
Testing paligemma2 finetuning on reasoning dataset
☆18Updated 6 months ago
BhabhaAI / dataformer
Solving data for LLMs - Create quality synthetic datasets!
☆150Updated 5 months ago
arcee-ai / DAM
☆52Updated 8 months ago
ariG23498 / gemma3-object-detection
Fine tune Gemma 3 on an object detection task
☆69Updated this week
teknium1 / ShareGPT-Builder
☆115Updated 6 months ago
Mihaiii / backtrack_sampler
An easy-to-understand framework for LLM samplers that rewind and revise generated tokens
☆140Updated 4 months ago