☆34Oct 31, 2024Updated last year
Alternatives and similar repositories for RISE
Users that are interested in RISE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GenRM-CoT: Data release for verification rationales☆68Oct 16, 2024Updated last year
- ☆13Jul 14, 2024Updated last year
- ☆27May 30, 2026Updated last week
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆19Feb 29, 2024Updated 2 years ago
- Official implementation of the NeurIPS 2024 paper CORY☆33Mar 4, 2026Updated 3 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆21Apr 16, 2025Updated last year
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆25Oct 8, 2024Updated last year
- ☆75Jun 10, 2025Updated last year
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆38Jul 9, 2025Updated 11 months ago
- Code for Paper "The Geometry of Reasoning: Flowing Logics in Representation Space" (ICLR 2026)☆50Jan 31, 2026Updated 4 months ago
- Minimal Decision Transformer Implementation written in Jax (Flax).☆18Aug 8, 2022Updated 3 years ago
- ☆31Sep 22, 2025Updated 8 months ago
- ☆11Feb 28, 2025Updated last year
- ☆71Jun 23, 2025Updated 11 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Sotopia-RL: Reward Design for Social Intelligence☆50Apr 1, 2026Updated 2 months ago
- ☆31Sep 12, 2025Updated 8 months ago
- ☆16Jul 29, 2025Updated 10 months ago
- Codebase for Iterative DPO Using Rule-based Rewards☆273Apr 11, 2025Updated last year
- RL algorithm: Advantage induced policy alignment☆66Aug 11, 2023Updated 2 years ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated 2 years ago
- ☆14Mar 5, 2024Updated 2 years ago
- Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.☆10May 16, 2024Updated 2 years ago
- ☆35Jan 29, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Official code for "Pretraining Representations For Data-Efficient Reinforcement Learning" (NeurIPS 2021)☆55Jul 27, 2021Updated 4 years ago
- Code for Abstract-to-Executable Trajectory Translation for One Shot Task Generalization (ICML 2023)☆23May 12, 2023Updated 3 years ago
- Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts☆15Feb 26, 2024Updated 2 years ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆124Sep 9, 2024Updated last year
- Benchmarking Social Intelligence of Language Agents through Interactive Scenarios☆13Jan 4, 2025Updated last year
- ☆18Oct 16, 2024Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86May 21, 2025Updated last year
- Chain-of-Thought Predictive Control☆56May 1, 2023Updated 3 years ago
- [ICML2024] Learning Divergence Fields for Shift-Robust Graph Representations☆11Aug 15, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Official Implementation of SAGE-GRPO:Manifold-Aware Exploration for Reinforcement Learning in Video Generation☆123Apr 2, 2026Updated 2 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆268May 5, 2025Updated last year
- Official implementation of Beyond the Heatmap: A Rigorous Evaluation of Component Impact in MCTS-Based TSP Solvers.☆12Mar 1, 2026Updated 3 months ago
- ☆14May 9, 2024Updated 2 years ago
- [NeurIPS25] RULE: Reinforcement UnLEarning Achieves Forge-retain Pareto Optimality☆21Oct 22, 2025Updated 7 months ago
- [ICLR 2025] Breaking Mental Set to Improve Reasoning through Diverse Multi-Agent Debate☆21Apr 22, 2025Updated last year
- [ICLR25] BID-Robot☆66Oct 19, 2025Updated 7 months ago