EachSheep / ShortcutsBenchLinks
ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents
☆108Updated 7 months ago
Alternatives and similar repositories for ShortcutsBench
Users that are interested in ShortcutsBench are comparing it to the libraries listed below
Sorting:
- Official implementation of MASS: Multi-Agent Simulation Scaling for Portfolio Construction☆164Updated 2 months ago
- Survey Paper List - Efficient LLM and Foundation Models☆260Updated last year
- Paper list for Personal LLM Agents☆424Updated last year
- Reproducing R1 for Code with Reliable Rewards☆285Updated 9 months ago
- ☆145Updated 4 months ago
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆218Updated 8 months ago
- ☆102Updated 10 months ago
- A Comprehensive Benchmark for Software Development.☆127Updated last year
- [ICML'25 Oral] Multi-agent Architecture Search via Agentic Supernet☆237Updated 2 months ago
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆192Updated last year
- InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks (ICML 2024)☆179Updated 8 months ago
- ✨✨Latest Papers and Datasets on Mobile and PC GUI Agent☆149Updated last year
- A Comprehensive Survey on Long Context Language Modeling☆222Updated 2 months ago
- SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation☆57Updated 6 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆143Updated 11 months ago
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆177Updated 3 months ago
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆97Updated 11 months ago
- [ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆353Updated 3 weeks ago
- This is the official implementation for paper "PENCIL: Long Thoughts with Short Memory".☆73Updated 8 months ago
- ☆68Updated last year
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆133Updated 10 months ago
- A comprehensive code domain benchmark review of LLM researches.☆194Updated 4 months ago
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆200Updated 2 months ago
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆61Updated 11 months ago
- ☆142Updated 10 months ago
- The official implementation of the paper "Mem-α: Learning Memory Construction via Reinforcement Learning"☆164Updated last month
- ☆29Updated 4 months ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆72Updated 10 months ago
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆69Updated last year
- awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.☆214Updated 3 months ago