RUCAIBox / awesome-llm-pretrainingLinks
Awesome LLM pre-training resources, including data, frameworks, and methods.
☆166Updated last month
Alternatives and similar repositories for awesome-llm-pretraining
Users that are interested in awesome-llm-pretraining are comparing it to the libraries listed below
Sorting:
- ☆150Updated last month
- ☆53Updated 6 months ago
- Train a Language Model with GRPO to create a schedule from a list of events and priorities☆188Updated last month
- This is a survey of research on AI scientists, AI researchers, AI engineers, and a series of AI-driven research studies☆63Updated 2 weeks ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆201Updated 3 months ago
- ☆191Updated last week
- ☆99Updated last year
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆238Updated last month
- A Comprehensive Survey on Long Context Language Modeling☆146Updated last week
- Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models☆106Updated 3 weeks ago
- An Open Math Pre-trainng Dataset with 370B Tokens.☆87Updated last month
- Collect every awesome work about r1!☆369Updated 3 weeks ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆182Updated this week
- ☆175Updated last month
- This is the reading list for the survey "A Survey on the Optimization of LLM-based Agents ". We will keep adding papers and improving the…☆98Updated 2 weeks ago
- [ACM Computing Surveys 2025] This repository collects awesome survey, resource, and paper for Lifelong Learning with Large Language Model…☆129Updated this week
- ☆198Updated last week
- ☆53Updated 2 months ago
- Real-time updated, fine-grained reading list on LLM-synthetic-data.🔥☆257Updated 4 months ago
- Awesome Agent Training☆128Updated last week
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆401Updated last month
- Repo of "Quantification of Large Language Model Distillation"☆85Updated last week
- The official GitHub page for the survey paper "A Survey on Data Augmentation in Large Model Era"☆123Updated 10 months ago
- ☆112Updated this week
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆54Updated 2 weeks ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆179Updated 2 months ago
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆248Updated 2 weeks ago
- Fantastic Data Engineering for Large Language Models☆88Updated 5 months ago
- The related works and background techniques about Openai o1☆221Updated 4 months ago
- ☆187Updated last month