EachSheep / ShortcutsBenchLinks
ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents
☆106Updated 4 months ago
Alternatives and similar repositories for ShortcutsBench
Users that are interested in ShortcutsBench are comparing it to the libraries listed below
Sorting:
- Official implementation of MASS: Multi-Agent Simulation Scaling for Portfolio Construction☆151Updated last month
- A Stream-based LLM Agent Framework for Continuous Context Sensing and Sharing☆40Updated 2 weeks ago
- Survey Paper List - Efficient LLM and Foundation Models☆258Updated last year
- A Comprehensive Benchmark for Software Development.☆116Updated last year
- Reproducing R1 for Code with Reliable Rewards☆264Updated 6 months ago
- Paper list for Personal LLM Agents☆417Updated last year
- ☆89Updated 7 months ago
- ☆147Updated last week
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆201Updated 5 months ago
- ☆134Updated last month
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆314Updated last month
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆69Updated 7 months ago
- [ICML'25 Oral] Multi-agent Architecture Search via Agentic Supernet☆204Updated 4 months ago
- A Comprehensive Survey on Long Context Language Modeling☆199Updated 4 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆132Updated 8 months ago
- A curated list of Awesome-LLM-Ensemble papers for the survey "Harnessing Multiple Large Language Models: A Survey on LLM Ensemble"☆158Updated this week
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆97Updated 8 months ago
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆186Updated 4 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆136Updated 6 months ago
- ☆28Updated last month
- awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.☆209Updated last week
- ✨✨Latest Papers and Datasets on Mobile and PC GUI Agent☆140Updated 11 months ago
- Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"☆117Updated 2 months ago
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆177Updated last year
- ☆65Updated 11 months ago
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆93Updated last month
- A Framework for LLM-based Multi-Agent Reinforced Training and Inference☆326Updated last week
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆34Updated 8 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆206Updated 8 months ago
- 🔥🔥🔥 ICLR 2025 Oral. Automating Agentic Workflow Generation.☆304Updated 3 months ago