rsshyam / GRPO-banditsLinks
☆13Updated 11 months ago
Alternatives and similar repositories for GRPO-bandits
Users that are interested in GRPO-bandits are comparing it to the libraries listed below
Sorting:
- ☆16Updated last year
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆35Updated 3 weeks ago
- Official implementation of Self-Taught Agentic Long Context Understanding (ACL 2025).☆10Updated last month
- ☆67Updated last year
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆29Updated 2 weeks ago
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆40Updated last month
- Resa: Transparent Reasoning Models via SAEs☆41Updated 3 weeks ago
- ☆12Updated 7 months ago
- Bayes-Adaptive RL for LLM Reasoning☆37Updated 3 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆36Updated last year
- An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation☆15Updated 10 months ago
- CS194-196 Course Project☆15Updated 6 months ago
- ☆23Updated 11 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆21Updated last month
- ☆48Updated 3 months ago
- ☆48Updated 3 months ago
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆70Updated 3 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆122Updated 10 months ago
- Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…☆19Updated 9 months ago
- ☆19Updated 5 months ago
- [ACM MM25] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models☆19Updated 5 months ago
- ☆67Updated 5 months ago
- Official repository of Graph RAG-Tool Fusion and ToolLinkOS dataset.☆16Updated 6 months ago
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆16Updated 3 weeks ago
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆10Updated 8 months ago
- ☆22Updated 2 weeks ago
- The official implementation of Preference Data Reward-Augmentation.☆18Updated 4 months ago
- MARFT stands for Multi-Agent Reinforcement Fine-Tuning. This repository implements an LLM-based multi-agent reinforcement fine-tuning fra…☆60Updated 3 weeks ago
- ☆34Updated 2 weeks ago
- Control LLM☆19Updated 4 months ago