THU-KEG / Agentic-Reward-Modeling
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆76Updated 3 weeks ago
Alternatives and similar repositories for Agentic-Reward-Modeling:
Users that are interested in Agentic-Reward-Modeling are comparing it to the libraries listed below
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆86Updated 5 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆131Updated last month
- ☆62Updated this week
- ☆84Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆140Updated this week
- ☆44Updated 3 months ago
- ☆103Updated 2 months ago
- ☆24Updated 6 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆80Updated last week
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆91Updated this week
- A repository for research on medium sized language models.☆76Updated 10 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆32Updated 5 months ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆28Updated 2 weeks ago
- MPO: Boosting LLM Agents with Meta Plan Optimization☆43Updated 3 weeks ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆82Updated this week
- ☆111Updated last month
- The official repo for the code and data of paper SMART☆22Updated last month
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆95Updated 2 months ago
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆46Updated last month
- ☆35Updated last week
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆53Updated 6 months ago
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆47Updated 3 weeks ago
- Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"☆115Updated last week
- ☆56Updated 3 months ago
- FuseAI Project☆84Updated 2 months ago
- Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluati…☆39Updated 3 weeks ago
- ☆16Updated 3 weeks ago
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆75Updated 2 weeks ago
- Train your own SOTA deductive reasoning model☆81Updated 3 weeks ago
- ☆80Updated last month