EachSheep / ShortcutsBenchLinks
ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents
☆107Updated 5 months ago
Alternatives and similar repositories for ShortcutsBench
Users that are interested in ShortcutsBench are comparing it to the libraries listed below
Sorting:
- Official implementation of MASS: Multi-Agent Simulation Scaling for Portfolio Construction☆155Updated last month
- Survey Paper List - Efficient LLM and Foundation Models☆258Updated last year
- Paper list for Personal LLM Agents☆423Updated last year
- Reproducing R1 for Code with Reliable Rewards☆278Updated 7 months ago
- A Comprehensive Benchmark for Software Development.☆124Updated last year
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆214Updated 6 months ago
- [ICML'25 Oral] Multi-agent Architecture Search via Agentic Supernet☆223Updated last month
- A Stream-based LLM Agent Framework for Continuous Context Sensing and Sharing☆41Updated last month
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆215Updated 10 months ago
- ☆140Updated 3 months ago
- ☆94Updated 8 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆140Updated 10 months ago
- ☆190Updated last month
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".☆93Updated 2 years ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆337Updated 2 months ago
- Must-read papers on Repository-level Code Generation & Issue Resolution 🔥☆223Updated last week
- SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts☆55Updated 2 weeks ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆39Updated 10 months ago
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆173Updated 2 months ago
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆622Updated 2 months ago
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆131Updated 8 months ago
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆189Updated last year
- ☆293Updated 5 months ago
- GitHub page for "Large Language Model-Brained GUI Agents: A Survey"☆214Updated 5 months ago
- This is the official implementation for paper "PENCIL: Long Thoughts with Short Memory".☆70Updated 7 months ago
- Multimodal Large Language Models for Code Generation under Multimodal Scenarios☆183Updated last week
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆98Updated 10 months ago
- A Comprehensive Survey on Long Context Language Modeling☆215Updated 3 weeks ago
- Repoformer: Selective Retrieval for Repository-Level Code Completion (ICML 2024)☆64Updated 6 months ago
- a curated list of high-quality papers on resource-efficient LLMs 🌱☆150Updated 9 months ago