RyanLiu112 / compute-optimal-tts
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
☆253Updated 2 months ago
Alternatives and similar repositories for compute-optimal-tts:
Users that are interested in compute-optimal-tts are comparing it to the libraries listed below
- ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates☆382Updated last month
- ☆287Updated last month
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆220Updated last month
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆175Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆195Updated last month
- AN O1 REPLICATION FOR CODING☆333Updated 4 months ago
- Large Reasoning Models☆804Updated 5 months ago
- TTRL: Test-Time Reinforcement Learning☆407Updated last week
- Explore the Multimodal “Aha Moment” on 2B Model☆583Updated last month
- ☆153Updated last month
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆495Updated 2 weeks ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆163Updated this week
- ☆192Updated 2 months ago
- A series of technical report on Slow Thinking with LLM☆659Updated 3 weeks ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆312Updated 3 weeks ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆176Updated 3 weeks ago
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models☆350Updated last week
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆123Updated this week
- ☆168Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆186Updated 3 weeks ago
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆180Updated this week
- ☆138Updated last week
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆141Updated 2 weeks ago
- repo for paper https://arxiv.org/abs/2504.13837☆113Updated 2 weeks ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆120Updated last month
- ☆115Updated last week
- ☆279Updated 9 months ago
- TransMLA: Multi-Head Latent Attention Is All You Need☆243Updated this week
- ☆149Updated last week
- ☆121Updated this week