SALT-NLP / collaborative-gymLinks

Framework and toolkits for building and evaluating collaborative agents that can work together with humans.

☆91

Alternatives and similar repositories for collaborative-gym

Users that are interested in collaborative-gym are comparing it to the libraries listed below

Sorting:

StonyBrookNLP / appworld
🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…
☆232Updated 2 months ago
facebookresearch / sweet_rl
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆233Updated 3 months ago
satori-reasoning / Satori
[ICML 2025] Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
☆105Updated 2 months ago
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆103Updated 2 weeks ago
kohjingyu / search-agents
Code for the paper 🌳 Tree Search for Language Model Agents
☆208Updated last year
siyuyuan / evoagent
Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"
☆120Updated 9 months ago
suzgunmirac / dynamic-cheatsheet
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
☆69Updated 2 months ago
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆140Updated 8 months ago
xlang-ai / Spider2-V
[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
☆129Updated 11 months ago
zorazrw / agent-workflow-memory
AWM: Agent Workflow Memory
☆300Updated 6 months ago
rxlqn / awesome-llm-self-reflection
augmented LLM with self reflection
☆128Updated last year
OSU-NLP-Group / WebDreamer
"Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"
☆78Updated 4 months ago
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Updated last year
OSU-NLP-Group / Middleware
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)
☆37Updated 7 months ago
THU-KEG / Agentic-Reward-Modeling
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆99Updated last month
WindyLee0822 / Process_Q_Model
official implementation of paper "Process Reward Model with Q-value Rankings"
☆60Updated 6 months ago
Open-Source-O1 / o1_Reasoning_Patterns_Study
☆103Updated 8 months ago
vsubramaniam851 / multiagent-ft
☆212Updated 5 months ago
zorazrw / awesome-tool-llm
☆237Updated 11 months ago
jwhj / OREO
☆114Updated 6 months ago
jiangjiechen / auction-arena
Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…
☆47Updated last year
openai / safety-rbr-code-and-data
Code and example data for the paper: Rule Based Rewards for Language Model Safety
☆190Updated last year
zai-org / ComplexFuncBench
Complex Function Calling Benchmark.
☆123Updated 6 months ago
TIGER-AI-Lab / CritiqueFineTuning
Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]
☆169Updated last month
sunblaze-ucb / Intuitor
Code for the paper: "Learning to Reason without External Rewards"
☆344Updated 3 weeks ago
microsoft / simulated-trial-and-error
☆122Updated last year
SALT-NLP / demonstrated-feedback
☆125Updated 10 months ago
WeiminXiong / MPO
MPO: Boosting LLM Agents with Meta Plan Optimization
☆64Updated 5 months ago
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆233Updated 9 months ago
MLE-Dojo / MLE-Dojo
☆61Updated 2 weeks ago