InternLM / OREAL
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
☆169Updated 3 weeks ago
Alternatives and similar repositories for OREAL:
Users that are interested in OREAL are comparing it to the libraries listed below
- ☆184Updated last month
- ☆62Updated 4 months ago
- ☆99Updated last week
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆102Updated this week
- ☆278Updated 3 weeks ago
- A Comprehensive Survey on Long Context Language Modeling☆129Updated 2 weeks ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆222Updated last week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆180Updated 3 weeks ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆123Updated 9 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆90Updated last month
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆109Updated last week
- Repo of paper "Free Process Rewards without Process Labels"☆140Updated last month
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆128Updated 3 months ago
- ☆151Updated this week
- SOTA RL fine-tuning solution for advanced math reasoning of LLM☆103Updated last week
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆170Updated this week
- Reproducing R1 for Code with Reliable Rewards☆167Updated last week
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆153Updated 3 weeks ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆157Updated this week
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆174Updated last month
- ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates☆369Updated last week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆99Updated last month
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆181Updated last year
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆186Updated last month
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆196Updated 11 months ago
- ☆104Updated last year
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆101Updated 4 months ago
- The related works and background techniques about Openai o1☆218Updated 3 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆175Updated this week
- ☆148Updated 3 months ago