EachSheep / ShortcutsBench
ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents
☆85Updated last month
Alternatives and similar repositories for ShortcutsBench:
Users that are interested in ShortcutsBench are comparing it to the libraries listed below
- Survey Paper List - Efficient LLM and Foundation Models☆241Updated 6 months ago
- ☆98Updated last year
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".☆85Updated last year
- A Stream-based LLM Agent Framework for Continuous Context Sensing and Sharing☆35Updated 4 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆23Updated last month
- ☆47Updated 4 months ago
- paper and its code for AI System☆289Updated this week
- ☆95Updated 6 months ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆46Updated 5 months ago
- Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"☆29Updated last year
- Federated Learning Systems Paper List☆71Updated last year
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆240Updated last month
- [ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference☆19Updated 2 weeks ago
- [ICML 2024] Serving LLMs on heterogeneous decentralized clusters.☆23Updated 11 months ago
- ☆41Updated 4 months ago
- One-size-fits-all model for mobile AI, a novel paradigm for mobile AI in which the OS and hardware co-manage a foundation model that is c…☆25Updated last year
- This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…☆99Updated last month
- ☆19Updated last month
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆152Updated 6 months ago
- ☆81Updated 3 years ago
- ☆91Updated 3 months ago
- Reproducing R1 for Code with Reliable Rewards☆152Updated last week
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation☆56Updated 8 months ago
- a curated list of high-quality papers on resource-efficient LLMs 🌱☆111Updated 3 weeks ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆177Updated last month
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆113Updated last year
- Simple extension on vLLM to help you speed up reasoning model without training.☆142Updated last month
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆23Updated 4 months ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆34Updated 2 years ago
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆156Updated 9 months ago