RUC-GSAI / YuLan-Mini
A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.
☆176Updated 3 weeks ago
Alternatives and similar repositories for YuLan-Mini:
Users that are interested in YuLan-Mini are comparing it to the libraries listed below
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆242Updated 2 weeks ago
- ☆192Updated 2 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆195Updated last month
- A Comprehensive Survey on Long Context Language Modeling☆138Updated last month
- ☆287Updated last month
- Fantastic Data Engineering for Large Language Models☆87Updated 4 months ago
- ☆144Updated last month
- ☆168Updated last month
- ☆115Updated last week
- ☆276Updated 9 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆135Updated 4 months ago
- ☆138Updated this week
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆175Updated last month
- ☆149Updated this week
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆141Updated 2 weeks ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆249Updated 4 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆94Updated last month
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 3 months ago
- ☆63Updated 5 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆74Updated last month
- ☆94Updated 4 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆234Updated 3 weeks ago
- ☆314Updated 7 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆112Updated 2 weeks ago
- An Open Math Pre-trainng Dataset with 370B Tokens.☆78Updated last month
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆149Updated 7 months ago
- ☆183Updated 3 weeks ago
- On Memorization of Large Language Models in Logical Reasoning☆65Updated last month
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆312Updated 3 weeks ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆136Updated last month