Frostlinx / Socratic-ZeroLinks
Socratic-Zero is a fully autonomous framework that generates high-quality training data for mathematical reasoning
☆32Updated last month
Alternatives and similar repositories for Socratic-Zero
Users that are interested in Socratic-Zero are comparing it to the libraries listed below
Sorting:
- One-shot Entropy Minimization☆187Updated 6 months ago
- A Sober Look at Language Model Reasoning☆89Updated last month
- ☆140Updated 3 months ago
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆97Updated last year
- ☆69Updated 6 months ago
- ☆346Updated 4 months ago
- repo for paper https://arxiv.org/abs/2504.13837☆288Updated 5 months ago
- Code for paper "Merging Multi-Task Models via Weight-Ensembling Mixture of Experts"☆30Updated last year
- ☆173Updated 2 weeks ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆133Updated 8 months ago
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆37Updated 5 months ago
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆86Updated 5 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)☆168Updated last month
- ☆189Updated 7 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆72Updated 7 months ago
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆98Updated 9 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆258Updated 7 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆97Updated 11 months ago
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆44Updated 4 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆84Updated 6 months ago
- ☆46Updated 8 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆123Updated 8 months ago
- ☆53Updated 10 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆50Updated last year
- ☆135Updated 9 months ago
- [NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"☆138Updated last month
- ☆212Updated 6 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆88Updated 10 months ago
- A research repo for experiments about Reinforcement Finetuning☆53Updated 8 months ago
- ☆57Updated 5 months ago