tengwang0318 / hierarchial_reward_modelLinks
Offical Code For "Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models"
☆18Updated 6 months ago
Alternatives and similar repositories for hierarchial_reward_model
Users that are interested in hierarchial_reward_model are comparing it to the libraries listed below
Sorting:
- ☆53Updated 8 months ago
- R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning☆65Updated 4 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆23Updated 2 months ago
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆66Updated 5 months ago
- This the implementation of LeCo☆31Updated 8 months ago
- ☆84Updated 6 months ago
- ☆40Updated 2 months ago
- ☆33Updated 4 months ago
- ☆104Updated 10 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆70Updated 4 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆82Updated 6 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆83Updated 4 months ago
- WideSearch: Benchmarking Agentic Broad Info-Seeking☆96Updated last week
- ☆38Updated 2 months ago
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆20Updated last year
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆54Updated 4 months ago
- MiroRL is an MCP-first reinforcement learning framework for deep research agent.☆164Updated last month
- ☆25Updated last year
- ☆46Updated 4 months ago
- ☆95Updated 10 months ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆51Updated 11 months ago
- Efficient Agent Training for Computer Use☆130Updated last month
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models☆37Updated last month
- A research repo for experiments about Reinforcement Finetuning☆52Updated 6 months ago
- ☆81Updated 2 months ago
- This is the official implementation for paper "PENCIL: Long Thoughts with Short Memory".☆65Updated 5 months ago
- Code and data for QueryAgent(ACL 2024)☆21Updated 10 months ago
- A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…☆114Updated 3 weeks ago
- ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry☆33Updated 3 weeks ago
- ☆43Updated last week