EachSheep / ShortcutsBenchLinks

ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents

☆107

Alternatives and similar repositories for ShortcutsBench

Users that are interested in ShortcutsBench are comparing it to the libraries listed below

Sorting:

gta0804 / MASS
Official implementation of MASS: Multi-Agent Simulation Scaling for Portfolio Construction
☆153Updated last week
ganler / code-r1
Reproducing R1 for Code with Reliable Rewards
☆272Updated 6 months ago
MobileLLM / Personal_LLM_Agents_Survey
Paper list for Personal LLM Agents
☆421Updated last year
hao-ai-lab / Dynasor
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆207Updated 5 months ago
open-compass / DevEval
A Comprehensive Benchmark for Software Development.
☆119Updated last year
yanweiyue / AgentPrune
☆91Updated 8 months ago
UbiquitousLearning / Efficient_Foundation_Model_Survey
Survey Paper List - Efficient LLM and Foundation Models
☆257Updated last year
GeniusHTX / TALE
☆136Updated 2 months ago
MobileLLM / ChainStream
A Stream-based LLM Agent Framework for Continuous Context Sensing and Sharing
☆41Updated last month
bingreeky / MaAS
[ICML'25 Oral] Multi-agent Architecture Search via Agentic Supernet
☆214Updated 2 weeks ago
hemingkx / SWIFT
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
☆58Updated 9 months ago
MIT-MI / MEM1
☆168Updated last month
SkyworkAI / skywork-o1-prm-inference
☆65Updated last year
uservan / speculative_thinking
☆29Updated last month
KANABOON1 / MemGen
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
☆196Updated 3 weeks ago
LCLM-Horizon / A-Comprehensive-Survey-For-Long-Context-Language-Modeling
A Comprehensive Survey on Long Context Language Modeling
☆204Updated this week
StarDewXXX / O1-Pruner
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆97Updated 9 months ago
junchenzhi / Awesome-LLM-Ensemble
A curated list of Awesome-LLM-Ensemble papers for the survey "Harnessing Multiple Large Language Models: A Survey on LLM Ensemble"
☆164Updated last week
bytedance / FullStackBench
Official repository for our paper "FullStack Bench: Evaluating LLMs as Full Stack Coders"
☆107Updated 6 months ago
TsinghuaC3I / MARTI
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
☆348Updated last week
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆324Updated 2 months ago
QiushiSun / Awesome-Code-Intelligence
Neural Code Intelligence Survey 2024; Reading lists and resources
☆276Updated 4 months ago
hyx1999 / SAM-Decoding
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
☆36Updated 9 months ago
zjunlp / WorfBench
[ICLR 2025] Benchmarking Agentic Workflow Generation
☆135Updated 9 months ago
Blueyee / Efficient-CoT-LRMs
Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!
☆70Updated 7 months ago
EIT-NLP / Awesome-Latent-CoT
This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.
☆194Updated 2 weeks ago
BaohaoLiao / RSD
[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
☆51Updated 6 months ago
ZongqianLi / Prompt-Compression-Survey
[NAACL 2025 Main Selected Oral] Repository for the paper: Prompt Compression for Large Language Models: A Survey
☆35Updated 6 months ago
zhenyuhe00 / SWE-Swiss
SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution
☆97Updated 2 months ago
metame-ai / awesome-llm-plaza
awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.
☆211Updated 3 weeks ago