☆116Jan 21, 2025Updated last year
Alternatives and similar repositories for OREO
Users that are interested in OREO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official implementation of Preference Data Reward-Augmentation.☆18May 1, 2025Updated 11 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86May 21, 2025Updated 10 months ago
- ☆64Mar 30, 2026Updated last week
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆95Nov 8, 2025Updated 5 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆63Jun 4, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆78Aug 17, 2024Updated last year
- A benchmark for evaluating reinforcement learning algorithms that train the policies using imaginary rollouts from LLMs.☆14Nov 4, 2025Updated 5 months ago
- Author implementation of Monte Carlo Augmented Actor Critic in PyTorch☆18Oct 24, 2022Updated 3 years ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆32Dec 5, 2024Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆153Feb 14, 2025Updated last year
- Scalable RL solution for advanced reasoning of language models☆1,841Mar 18, 2025Updated last year
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆43Oct 31, 2025Updated 5 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆65Oct 24, 2025Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆78Jun 28, 2025Updated 9 months ago
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Jun 1, 2025Updated 10 months ago
- The reproduct of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction☆22May 29, 2024Updated last year
- ☆69Jul 8, 2025Updated 9 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆266May 5, 2025Updated 11 months ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆189May 20, 2025Updated 10 months ago
- ☆20Dec 14, 2024Updated last year
- UFT: Unifying Supervised and Reinforcement Fine-Tuning☆27Jun 30, 2025Updated 9 months ago
- Official implementation of ECCV24 paper: POA☆24Aug 8, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆127Jun 11, 2025Updated 10 months ago
- Official code for ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning (AAAI'24)☆17Feb 10, 2024Updated 2 years ago
- Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"☆41Sep 24, 2024Updated last year
- ☆67Feb 4, 2026Updated 2 months ago
- Train transformer language models with reinforcement learning.☆19Feb 25, 2025Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- [NeurIPS 2023 Spotlight] Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training☆36Apr 7, 2025Updated last year
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆56Jun 16, 2024Updated last year
- ☆79Nov 19, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆116Feb 9, 2024Updated 2 years ago
- Repo of paper "Free Process Rewards without Process Labels"☆171Mar 14, 2025Updated last year
- ☆13May 21, 2024Updated last year
- JudgeLRM: Large Reasoning Models as a Judge☆41Mar 30, 2026Updated last week
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆15Jun 28, 2025Updated 9 months ago
- ☆22Feb 12, 2025Updated last year
- The original Shared Recurrent Memory Transformer implementation☆33Jul 11, 2025Updated 9 months ago