RyanLiu112 / compute-optimal-tts
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
☆225Updated last month
Alternatives and similar repositories for compute-optimal-tts:
Users that are interested in compute-optimal-tts are comparing it to the libraries listed below
- A series of technical report on Slow Thinking with LLM☆595Updated this week
- ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates☆353Updated this week
- Large Reasoning Models☆799Updated 3 months ago
- AN O1 REPLICATION FOR CODING☆329Updated 3 months ago
- ☆260Updated last week
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆767Updated last week
- Explore the Multimodal “Aha Moment” on 2B Model☆524Updated last week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆148Updated last week
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆158Updated last week
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆376Updated this week
- ☆559Updated last week
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆597Updated 2 months ago
- ☆910Updated 2 months ago
- [CVPR'25] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness☆326Updated 3 weeks ago
- ☆485Updated last week
- A Survey on Efficient Reasoning for LLMs☆116Updated this week
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆165Updated last week
- ☆186Updated this week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆212Updated this week
- MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning☆425Updated last week
- TransMLA: Multi-Head Latent Attention Is All You Need☆220Updated 3 weeks ago
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆357Updated 2 months ago
- ☆166Updated last month
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆105Updated last week
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆145Updated this week
- ☆518Updated this week
- Collect every awesome work about r1!☆306Updated last week
- ☆124Updated 3 weeks ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆229Updated last month
- ☆264Updated 8 months ago