benchflow-ai / skillsbenchLinks
SkillsBench evaluates how well skills work and how effective agents are at using them
☆25Updated this week
Alternatives and similar repositories for skillsbench
Users that are interested in skillsbench are comparing it to the libraries listed below
Sorting:
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL☆224Updated 3 months ago
- The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution☆204Updated this week
- [ICML 2025] ResearchTown: Simulator of Human Research Community☆188Updated this week
- [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning☆145Updated 3 months ago
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆100Updated 3 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆140Updated last year
- SWE Arena☆35Updated 6 months ago
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆231Updated 7 months ago
- ☆123Updated this week
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆95Updated 2 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆101Updated 4 months ago
- Official repository for DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research☆503Updated last week
- MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning☆110Updated last month
- Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"☆185Updated 2 months ago
- ☆88Updated 2 months ago
- [NeurIPS 2025 D&B] 🚀 SWE-bench Goes Live!☆153Updated last week
- SSRL: Self-Search Reinforcement Learning☆201Updated 4 months ago
- ☆130Updated 8 months ago
- [NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆187Updated 6 months ago
- Data Synthesis for Deep Research Based on Semi-Structured Data☆191Updated 3 weeks ago
- Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"☆166Updated 2 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆119Updated 7 months ago
- This is a survey of research on AI scientists, AI researchers, AI engineers, and a series of AI-driven research studies☆166Updated 2 months ago
- Implementation for OAgents: An Empirical Study of Building Effective Agents☆299Updated 2 months ago
- Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike stat…☆411Updated last month
- Repository for Zochi's Research☆297Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆254Updated 8 months ago
- Demystifying Reinforcement Learning in Agentic Reasoning☆146Updated 2 months ago
- MrlX: A Multi-Agent Reinforcement Learning Framework☆160Updated last month
- ☆74Updated 3 months ago