EachSheep / ShortcutsBenchLinks
ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents
☆103Updated 3 months ago
Alternatives and similar repositories for ShortcutsBench
Users that are interested in ShortcutsBench are comparing it to the libraries listed below
Sorting:
- Official implementation of MASS: Multi-Agent Simulation Scaling for Portfolio Construction☆148Updated this week
- Reproducing R1 for Code with Reliable Rewards☆258Updated 4 months ago
- A Comprehensive Benchmark for Software Development.☆113Updated last year
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆194Updated 3 months ago
- ☆85Updated 6 months ago
- ☆131Updated 2 weeks ago
- Paper list for Personal LLM Agents☆412Updated last year
- [ICML'25 Oral] Multi-agent Architecture Search via Agentic Supernet☆179Updated 3 months ago
- Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"☆110Updated 3 weeks ago
- A Stream-based LLM Agent Framework for Continuous Context Sensing and Sharing☆42Updated 9 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆32Updated 7 months ago
- ☆183Updated 2 months ago
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆81Updated last year
- Survey Paper List - Efficient LLM and Foundation Models☆257Updated last year
- Must-read papers on Repository-level Code Generation & Issue Resolution 🔥☆163Updated last week
- ☆113Updated last month
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆128Updated 7 months ago
- Official repository for our paper "FullStack Bench: Evaluating LLMs as Full Stack Coders"☆105Updated 4 months ago
- [ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.☆55Updated 2 months ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆284Updated last week
- Neural Code Intelligence Survey 2024; Reading lists and resources☆268Updated 2 months ago
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆88Updated 7 months ago
- ☆67Updated 5 months ago
- Course information for CS598-Topics in LLM Agents(25'Spring) under the direction of Prof. Jiaxuan You ( jiaxuan@illinois.edu ).☆34Updated 4 months ago
- A curated list of Awesome-LLM-Ensemble papers for the survey "Harnessing Multiple Large Language Models: A Survey on LLM Ensemble"☆123Updated this week
- ☆23Updated 3 months ago
- awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.☆210Updated last week
- [TOSEM'25] The official GitHub page for the survey paper "A Survey on Large Language Models for Code Generation".☆153Updated 2 months ago
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".☆90Updated 2 years ago
- ☆65Updated 10 months ago