EachSheep / ShortcutsBench
ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents
☆80Updated last month
Alternatives and similar repositories for ShortcutsBench:
Users that are interested in ShortcutsBench are comparing it to the libraries listed below
- Survey Paper List - Efficient LLM and Foundation Models☆238Updated 4 months ago
- ☆99Updated last year
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".☆81Updated last year
- ☆57Updated last month
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆143Updated 4 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆219Updated last month
- ☆87Updated 4 months ago
- Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.☆206Updated 2 months ago
- paper and its code for AI System☆269Updated 3 weeks ago
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆292Updated last week
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆37Updated 3 months ago
- Multi-Candidate Speculative Decoding☆34Updated 9 months ago
- a curated list of high-quality papers on resource-efficient LLMs 🌱☆103Updated 2 weeks ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆157Updated this week
- ☆75Updated last month
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆68Updated last week
- ☆41Updated 2 months ago
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs☆32Updated 6 months ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆34Updated 2 years ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆51Updated last week
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆111Updated 11 months ago
- ☆52Updated 10 months ago
- ☆36Updated 2 months ago
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆222Updated 3 months ago
- Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''☆23Updated 3 months ago
- Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?☆92Updated 3 months ago
- The official implementation of the paper "Demystifying the Compression of Mixture-of-Experts Through a Unified Framework".☆57Updated 3 months ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆41Updated 3 months ago
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆136Updated 7 months ago
- Paper list for Personal LLM Agents☆371Updated 9 months ago