Aegis1863 / LLMs-Distillation-Quantification
Repo of "Quantification of Large Language Model Distillation"
☆78Updated last month
Alternatives and similar repositories for LLMs-Distillation-Quantification:
Users that are interested in LLMs-Distillation-Quantification are comparing it to the libraries listed below
- Knowledge-Reasoning Synergy Reinforcement Learning.☆34Updated last month
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆174Updated last week
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆74Updated last month
- ☆41Updated this week
- ☆74Updated 3 weeks ago
- An Open Math Pre-trainng Dataset with 370B Tokens.☆72Updated 3 weeks ago
- From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation☆88Updated last month
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆152Updated last week
- ☆101Updated 4 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆79Updated 2 months ago
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆64Updated 2 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆132Updated 10 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆86Updated last month
- A Comprehensive Survey on Long Context Language Modeling☆131Updated 3 weeks ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆182Updated 2 weeks ago
- ☆130Updated 3 months ago
- ☆93Updated 4 months ago
- ☆57Updated last month
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆282Updated last week
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 3 months ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆42Updated 10 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆236Updated last week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆229Updated last week
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆158Updated last month
- Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models☆101Updated 3 weeks ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated 10 months ago
- ☆52Updated 2 months ago
- ☆153Updated 3 weeks ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆117Updated last week
- ☆40Updated last month