GuanghaoYe / Emergence-of-Thinking
☆50Updated 3 months ago
Alternatives and similar repositories for Emergence-of-Thinking
Users that are interested in Emergence-of-Thinking are comparing it to the libraries listed below
Sorting:
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆61Updated 5 months ago
- The code and data for the paper JiuZhang3.0☆44Updated 11 months ago
- ☆66Updated 5 months ago
- Revisiting Mid-training in the Era of RL Scaling☆37Updated 3 weeks ago
- ☆97Updated 2 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆134Updated 7 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆95Updated last week
- Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"☆40Updated 7 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆57Updated 3 months ago
- ☆46Updated this week
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 6 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆84Updated 7 months ago
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆45Updated this week
- GenRM-CoT: Data release for verification rationales☆59Updated 6 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆52Updated 2 months ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆26Updated last year
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆29Updated 7 months ago
- ☆31Updated 8 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆72Updated last month
- This the implementation of LeCo☆31Updated 3 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆119Updated last month
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆45Updated 4 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 3 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆74Updated 11 months ago
- ☆110Updated 3 months ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆55Updated 2 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆43Updated 6 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆179Updated 2 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆58Updated 4 months ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago