rsshyam / GRPO-banditsLinks
☆13Updated last year
Alternatives and similar repositories for GRPO-bandits
Users that are interested in GRPO-bandits are comparing it to the libraries listed below
Sorting:
- Resa: Transparent Reasoning Models via SAEs☆43Updated 2 weeks ago
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆40Updated 3 months ago
- CS194-196 Course Project☆15Updated 7 months ago
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆27Updated last month
- ☆16Updated last year
- Official code repository for the paper "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"☆19Updated 2 weeks ago
- ☆67Updated last year
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆37Updated last year
- Lottery Ticket Adaptation☆40Updated 10 months ago
- Bayes-Adaptive RL for LLM Reasoning☆40Updated 4 months ago
- A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization☆16Updated 9 months ago
- The official implementation of Preference Data Reward-Augmentation.☆18Updated 5 months ago
- MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning☆31Updated last month
- 🚀 LLM-I: Transform LLMs into natural interleaved multimodal creators! ✨ Tool-use framework supporting image search, generation, code ex…☆28Updated this week
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆20Updated last year
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆16Updated 7 months ago
- ☆27Updated 3 months ago
- ☆53Updated 8 months ago
- ☆48Updated 5 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆24Updated 2 months ago
- autonomous agent with access to a tool library☆42Updated 2 weeks ago
- ☆13Updated last month
- ☆46Updated 4 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆37Updated 2 months ago
- Official implementation of Self-Taught Agentic Long Context Understanding (ACL 2025).☆10Updated 2 weeks ago
- [NeurIPS'25] Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning☆46Updated 3 weeks ago
- Code for the paper "FinRLlama: A Solution to LLM-Engineered Signals Challenge at FinRL Contest 2024"☆12Updated 7 months ago
- Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…☆19Updated 10 months ago
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆77Updated last year
- ☆11Updated last year