☆116Jan 21, 2025Updated last year
Alternatives and similar repositories for OREO
Users that are interested in OREO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official implementation of Preference Data Reward-Augmentation.☆18May 1, 2025Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86May 21, 2025Updated last year
- ☆64Mar 30, 2026Updated last month
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆99Nov 8, 2025Updated 6 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆63Jun 4, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆78Aug 17, 2024Updated last year
- A benchmark for evaluating reinforcement learning algorithms that train the policies using imaginary rollouts from LLMs.☆14Nov 4, 2025Updated 6 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆33Dec 5, 2024Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆154Feb 14, 2025Updated last year
- Scalable RL solution for advanced reasoning of language models☆1,859Mar 18, 2025Updated last year
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆43Oct 31, 2025Updated 6 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]☆64Apr 11, 2026Updated last month
- ☆78Jun 28, 2025Updated 10 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Jun 1, 2025Updated 11 months ago
- The reproduct of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction☆21May 29, 2024Updated last year
- ☆70Jul 8, 2025Updated 10 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆268May 5, 2025Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆189May 20, 2025Updated last year
- ☆20Dec 14, 2024Updated last year
- UFT: Unifying Supervised and Reinforcement Fine-Tuning☆30Jun 30, 2025Updated 10 months ago
- Official implementation of ECCV24 paper: POA☆24Aug 8, 2024Updated last year
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆131Jun 11, 2025Updated 11 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Official code for ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning (AAAI'24)☆17Feb 10, 2024Updated 2 years ago
- Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"☆41Sep 24, 2024Updated last year
- ☆68Feb 4, 2026Updated 3 months ago
- Train transformer language models with reinforcement learning.☆19Feb 25, 2025Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- [NeurIPS 2023 Spotlight] Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training☆36Apr 7, 2025Updated last year
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆56Jun 16, 2024Updated last year
- ☆80Nov 19, 2024Updated last year
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆116Feb 9, 2024Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- [ACL 2025] "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning." https://arxiv.org/abs/2503.1…☆18Jul 22, 2025Updated 10 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆171Mar 14, 2025Updated last year
- ☆13May 21, 2024Updated 2 years ago
- ☆64Apr 17, 2026Updated last month
- JudgeLRM: Large Reasoning Models as a Judge☆42May 6, 2026Updated 2 weeks ago
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆15Jun 28, 2025Updated 10 months ago
- Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?☆20Mar 9, 2025Updated last year