linhaowei1 / kumoLinks
☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models
☆19Updated 6 months ago
Alternatives and similar repositories for kumo
Users that are interested in kumo are comparing it to the libraries listed below
Sorting:
- ☆63Updated last month
- Verlog: A Multi-turn RL framework for LLM agents☆67Updated last month
- Natural Language Reinforcement Learning☆100Updated 4 months ago
- ☆65Updated 9 months ago
- ☆118Updated 8 months ago
- ☆51Updated 10 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86Updated 7 months ago
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning☆171Updated 3 months ago
- ☆122Updated 3 weeks ago
- ☆66Updated 6 months ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR'25)☆46Updated 8 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆50Updated 5 months ago
- A repo for open research on building large reasoning models☆121Updated last week
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆30Updated last year
- Dateset Reset Policy Optimization☆31Updated last year
- Paper collections of the continuous effort start from World Models.☆191Updated last year
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆147Updated last year
- Code for "Reasoning to Learn from Latent Thoughts"☆123Updated 8 months ago
- ☆21Updated 7 months ago
- GROOT: Learning to Follow Instructions by Watching Gameplay Videos (ICLR'24, Spotlight)☆65Updated 2 years ago
- Reinforcing General Reasoning without Verifiers☆92Updated 5 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆51Updated last month
- ☆117Updated 11 months ago
- [ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorials☆45Updated 10 months ago
- [IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simula…☆97Updated 6 months ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆63Updated 11 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆65Updated 10 months ago
- ☆133Updated last year
- ☆28Updated 9 months ago
- ☆108Updated last year