tajwarfahim / srtLinks
Official implementation for the paper "Can Large Reasoning Models Self-Train?"
☆49Updated 3 weeks ago
Alternatives and similar repositories for srt
Users that are interested in srt are comparing it to the libraries listed below
Sorting:
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆57Updated 4 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆106Updated 5 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆72Updated 3 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains☆142Updated 2 weeks ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆99Updated last month
- ☆96Updated 9 months ago
- An Open Math Pre-trainng Dataset with 370B Tokens.☆89Updated 2 months ago
- ☆68Updated 3 months ago
- Long Context Extension and Generalization in LLMs☆57Updated 9 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆25Updated 3 months ago
- o1 Chain of Thought Examples☆33Updated 8 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆105Updated 2 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆59Updated 4 months ago
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆78Updated last month
- Process Reward Models That Think☆41Updated 3 weeks ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 7 months ago
- ☆52Updated 2 weeks ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆138Updated 9 months ago
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆51Updated 7 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆159Updated 3 weeks ago
- Verifiers for LLM Reinforcement Learning☆60Updated 2 months ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 9 months ago
- ☆303Updated 2 weeks ago
- ☆24Updated 9 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆37Updated 4 months ago
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆46Updated last month
- ☆58Updated last week
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆93Updated 2 weeks ago