RyanLiu112 / compute-optimal-ttsLinks
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
☆261Updated 3 months ago
Alternatives and similar repositories for compute-optimal-tts
Users that are interested in compute-optimal-tts are comparing it to the libraries listed below
Sorting:
- ReasonFlux Series - Open-Sourced Strong Reasoning LLMs☆396Updated last week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆213Updated 3 weeks ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆231Updated 3 weeks ago
- ☆208Updated last week
- ☆293Updated this week
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆179Updated 2 months ago
- AN O1 REPLICATION FOR CODING☆336Updated 5 months ago
- ☆198Updated last week
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆182Updated this week
- A series of technical report on Slow Thinking with LLM☆679Updated last week
- ☆201Updated 3 months ago
- TTRL: Test-Time Reinforcement Learning☆570Updated last week
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆141Updated 5 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆589Updated 2 months ago
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models☆414Updated last week
- Large Reasoning Models☆804Updated 6 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆205Updated this week
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆367Updated 2 weeks ago
- ☆731Updated last month
- TransMLA: Multi-Head Latent Attention Is All You Need☆284Updated this week
- ☆934Updated 4 months ago
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆368Updated 4 months ago
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆518Updated last week
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆409Updated last month
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆166Updated last week
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆257Updated this week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆239Updated last month
- Tina: Tiny Reasoning Models via LoRA☆245Updated this week
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆631Updated 4 months ago
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆541Updated last week