☆116Jan 21, 2025Updated last year
Alternatives and similar repositories for OREO
Users that are interested in OREO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official implementation of Preference Data Reward-Augmentation.☆18May 1, 2025Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86May 21, 2025Updated last year
- ☆64Mar 30, 2026Updated 2 months ago
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆100Nov 8, 2025Updated 7 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆63Jun 4, 2024Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆78Aug 17, 2024Updated last year
- Author implementation of Monte Carlo Augmented Actor Critic in PyTorch☆18Oct 24, 2022Updated 3 years ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- A benchmark for evaluating reinforcement learning algorithms that train the policies using imaginary rollouts from LLMs.☆14Nov 4, 2025Updated 7 months ago
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆33Dec 5, 2024Updated last year
- Scalable RL solution for advanced reasoning of language models☆1,861Mar 18, 2025Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆155Feb 14, 2025Updated last year
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆43Oct 31, 2025Updated 7 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]☆65Apr 11, 2026Updated 2 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆80Jun 28, 2025Updated 11 months ago
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆72Jun 1, 2025Updated last year
- The reproduct of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction☆21May 29, 2024Updated 2 years ago
- ☆70Jul 8, 2025Updated 11 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆268May 5, 2025Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆191May 20, 2025Updated last year
- ☆20Dec 14, 2024Updated last year
- UFT: Unifying Supervised and Reinforcement Fine-Tuning☆30Jun 30, 2025Updated 11 months ago
- Official implementation of ECCV24 paper: POA☆24Aug 8, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆131Jun 11, 2025Updated last year
- Official code for ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning (AAAI'24)☆17Feb 10, 2024Updated 2 years ago
- ☆70Feb 4, 2026Updated 4 months ago
- Train transformer language models with reinforcement learning.☆19Feb 25, 2025Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- [NeurIPS 2023 Spotlight] Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training☆36Apr 7, 2025Updated last year
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆56Jun 16, 2024Updated last year
- Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"☆41Sep 24, 2024Updated last year
- ☆80Nov 19, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆116Feb 9, 2024Updated 2 years ago
- Repo of paper "Free Process Rewards without Process Labels"☆171Mar 14, 2025Updated last year
- [ACL 2025] "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning." https://arxiv.org/abs/2503.1…☆18Jul 22, 2025Updated 10 months ago
- ☆13May 21, 2024Updated 2 years ago
- ☆76Apr 17, 2026Updated last month
- JudgeLRM: Large Reasoning Models as a Judge☆42May 6, 2026Updated last month
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆15Jun 28, 2025Updated 11 months ago