thu-nics / MARSHALLinks
MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
☆27Updated last month
Alternatives and similar repositories for MARSHAL
Users that are interested in MARSHAL are comparing it to the libraries listed below
Sorting:
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆29Updated 3 months ago
- The original Shared Recurrent Memory Transformer implementation☆33Updated 5 months ago
- ☆46Updated 6 months ago
- LLM as World Models using Bayesian inference☆16Updated 7 months ago
- From Word to World: Can Large Language Models be Implicit Text-based World Models?☆30Updated 2 weeks ago
- Official Repository of Native Parallel Reasoner☆92Updated 3 weeks ago
- ☆65Updated 10 months ago
- Verlog: A Multi-turn RL framework for LLM agents☆67Updated last week
- Official Project Page for Monadic Context Engineering (https://arxiv.org/abs/2512.22431)☆15Updated last week
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆55Updated 2 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Updated 6 months ago
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents☆35Updated 3 months ago
- Resa: Transparent Reasoning Models via SAEs☆47Updated 3 months ago
- ☆64Updated 2 months ago
- A Practitioner's Guide to M(eow)ti Turn Agentic ReinfOrcement learning☆68Updated last month
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆63Updated this week
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆36Updated 3 months ago
- [ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant☆42Updated last year
- [EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆88Updated 6 months ago
- Official repo of paper LM2☆46Updated 10 months ago
- We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench show…☆48Updated this week
- ☆29Updated 2 months ago
- ☆75Updated 2 months ago
- ☆20Updated 3 months ago
- ☆28Updated 11 months ago
- ☆30Updated last year
- ☆29Updated 9 months ago
- OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System.☆18Updated last year
- ☆118Updated 3 weeks ago
- ☆50Updated 10 months ago