Awesome LLM pre-training resources, including data, frameworks, and methods.
☆388Apr 29, 2025Updated last year
Alternatives and similar repositories for awesome-llm-pretraining
Users that are interested in awesome-llm-pretraining are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆231Jul 25, 2025Updated 11 months ago
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated 2 years ago
- 🔥 A minimal training framework for scaling FLA models☆396Apr 22, 2026Updated 2 months ago
- The OlymMATH dataset☆25Jun 1, 2025Updated last year
- A series of technical report on Slow Thinking with LLM☆766Aug 13, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Apr 22, 2025Updated last year
- Collection of papers for scalable automated alignment.☆92Oct 22, 2024Updated last year
- ☆71Oct 16, 2024Updated last year
- Large Language Model in Action☆345Jan 28, 2025Updated last year
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 11 months ago
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…☆9,710Jun 17, 2026Updated last week
- Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al., EMNLP 2024)☆14Oct 3, 2024Updated last year
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆65Oct 3, 2024Updated last year
- Reproducible, flexible LLM evaluations☆383Mar 24, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework☆22,173Updated this week
- Ongoing research project for code&math LLMs