bkitano / llama-from-scratch
Llama from scratch, or How to implement a paper without crying
☆562Updated 11 months ago
Alternatives and similar repositories for llama-from-scratch
Users that are interested in llama-from-scratch are comparing it to the libraries listed below
Sorting:
- Minimalistic large language model 3D-parallelism training☆1,870Updated this week
- From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)☆706Updated 6 months ago
- What would you do with 1000 H100s...☆1,045Updated last year
- LLaMA 2 implemented from scratch in PyTorch☆324Updated last year
- LLM Workshop by Sourab Mangrulkar☆380Updated 11 months ago
- nanoGPT style version of Llama 3.1☆1,367Updated 9 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆791Updated 2 weeks ago
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆1,003Updated 8 months ago
- Fine-tune mistral-7B on 3090s, a100s, h100s☆711Updated last year
- YaRN: Efficient Context Window Extension of Large Language Models☆1,484Updated last year
- Data and tools for generating and inspecting OLMo pre-training data.☆1,209Updated 3 weeks ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆840Updated last week
- ☆515Updated 5 months ago
- A comprehensive deep dive into the world of tokens☆223Updated 10 months ago
- Official repository for ORPO☆452Updated 11 months ago
- A bibliography and survey of the papers surrounding o1☆1,192Updated 6 months ago
- ☆534Updated 8 months ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,526Updated last year
- LoRA and DoRA from Scratch Implementations☆202Updated last year
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,464Updated 2 months ago
- Best practices for distilling large language models.☆531Updated last year
- ☆1,182Updated 2 months ago
- Best practices & guides on how to write distributed pytorch training code☆418Updated 2 months ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,377Updated last week
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,328Updated 3 weeks ago
- A bagel, with everything.☆320Updated last year
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,381Updated last year
- A repository for research on medium sized language models.☆495Updated last week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,516Updated last week
- Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript☆578Updated 10 months ago