EachSheep / ShortcutsBenchLinks
ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents
☆108Updated 6 months ago
Alternatives and similar repositories for ShortcutsBench
Users that are interested in ShortcutsBench are comparing it to the libraries listed below
Sorting:
- Official implementation of MASS: Multi-Agent Simulation Scaling for Portfolio Construction☆157Updated last month
- ☆95Updated 9 months ago
- Paper list for Personal LLM Agents☆424Updated last year
- A Stream-based LLM Agent Framework for Continuous Context Sensing and Sharing☆41Updated 2 months ago
- Survey Paper List - Efficient LLM and Foundation Models☆259Updated last year
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆216Updated 7 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆39Updated 10 months ago
- Reproducing R1 for Code with Reliable Rewards☆278Updated 8 months ago
- [ICML'25 Oral] Multi-agent Architecture Search via Agentic Supernet☆231Updated last month
- A Comprehensive Benchmark for Software Development.☆124Updated last year
- ☆143Updated 3 months ago
- The official implementation of the paper "Mem-α: Learning Memory Construction via Reinforcement Learning"☆141Updated 2 weeks ago
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆61Updated 10 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆214Updated 10 months ago
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆100Updated 3 months ago
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆96Updated 10 months ago
- ☆49Updated last year
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".☆93Updated 2 years ago
- ☆49Updated 4 months ago
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆197Updated last month
- ☆73Updated last year
- Accepted LLM Papers in NeurIPS 2024☆37Updated last year
- ☆41Updated 9 months ago
- ☆36Updated 10 months ago
- A construction kit for reinforcement learning environment management.☆292Updated this week
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆132Updated 9 months ago
- A Comprehensive Survey on Long Context Language Modeling☆216Updated last month
- MrlX: A Multi-Agent Reinforcement Learning Framework☆160Updated last month
- A curated list of Awesome-LLM-Ensemble papers for the survey "Harnessing Multiple Large Language Models: A Survey on LLM Ensemble"☆182Updated 3 weeks ago
- SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts☆57Updated last month