All-Hands-AI / agent-sdkLinks

A clean, modular SDK for building AI agents with OpenHands V1.

☆22

Alternatives and similar repositories for agent-sdk

Users that are interested in agent-sdk are comparing it to the libraries listed below

Sorting:

InternLM / SWE-Fixer
☆117Updated 4 months ago
xlang-ai / computer-agent-arena
Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!
☆50Updated 5 months ago
microsoft / SWE-bench-Live
🚀 SWE-bench Goes Live!
☆119Updated this week
qishenghu / InstructCoder
InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw
☆62Updated 11 months ago
R2E-Gym / R2E-Gym
[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆159Updated 2 months ago
GAIR-NLP / AIME-Preview
☆73Updated 6 months ago
xlang-ai / OSWorld-G
Scaling Computer-Use Grounding via UI Decomposition and Synthesis
☆109Updated 3 months ago
OSU-NLP-Group / Mind2Web-2
Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge
☆78Updated 2 months ago
QwenLM / WorldPM
☆89Updated 4 months ago
zhenyuhe00 / SWE-Swiss
SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution
☆84Updated this week
suzgunmirac / dynamic-cheatsheet
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
☆74Updated 3 months ago
THUDM / DeepDive
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
☆95Updated last week
kohjingyu / search-agents
Code for the paper 🌳 Tree Search for Language Model Agents
☆215Updated last year
GAIR-NLP / PC-Agent-E
Efficient Agent Training for Computer Use
☆131Updated 2 weeks ago
GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆173Updated 2 months ago
chengyou-jia / AgentStore
[ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant
☆40Updated 9 months ago
Open-Source-O1 / o1_Reasoning_Patterns_Study
☆103Updated 9 months ago
hkust-nlp / B-STaR
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆84Updated 4 months ago
MLE-Dojo / MLE-Dojo
☆73Updated 3 weeks ago
allenai / IFBench
☆73Updated 2 months ago
TIGER-AI-Lab / General-Reasoner
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆171Updated 3 months ago
SALT-NLP / collaborative-gym
Framework and toolkits for building and evaluating collaborative agents that can work together with humans.
☆98Updated 5 months ago
TIGER-AI-Lab / CritiqueFineTuning
Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]
☆171Updated 2 months ago
shizhediao / Post-Training-Data-Flywheel
We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.
☆59Updated 11 months ago
xlang-ai / Spider2-V
[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
☆130Updated last year
dwzhu-pku / LongEmbed
LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)
☆143Updated 10 months ago
OSU-NLP-Group / Middleware
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)
☆37Updated 8 months ago
zai-org / ComplexFuncBench
Complex Function Calling Benchmark.
☆135Updated 8 months ago
hkust-nlp / WebExplorer
The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"
☆64Updated last week
AlexCuadron / ThinkingAgent
Systematic evaluation framework that automatically rates overthinking behavior in large language models.
☆93Updated 4 months ago