EachSheep / ShortcutsBench
ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents
☆78Updated last week
Alternatives and similar repositories for ShortcutsBench:
Users that are interested in ShortcutsBench are comparing it to the libraries listed below
- ☆99Updated last year
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".☆81Updated last year
- Survey Paper List - Efficient LLM and Foundation Models☆238Updated 3 months ago
- ☆81Updated 3 months ago
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆260Updated this week
- a curated list of high-quality papers on resource-efficient LLMs 🌱☆93Updated 2 weeks ago
- Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.☆191Updated last month
- Multi-Candidate Speculative Decoding☆34Updated 8 months ago
- A Stream-based LLM Agent Framework for Continuous Context Sensing and Sharing☆35Updated last month
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆134Updated 3 months ago
- ☆43Updated 2 weeks ago
- ☆50Updated last month
- Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?☆89Updated 3 months ago
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆214Updated 2 months ago
- ☆36Updated 4 months ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆70Updated 7 months ago
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆132Updated 6 months ago
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation☆53Updated 5 months ago
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆61Updated 3 weeks ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆151Updated 7 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆16Updated 3 weeks ago
- [ICML 2024] Serving LLMs on heterogeneous decentralized clusters.☆17Updated 8 months ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆29Updated 2 months ago
- [Preprint] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆59Updated 4 months ago
- SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆31Updated last month
- The repo for In-context Autoencoder☆103Updated 8 months ago
- ☆40Updated last month
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆205Updated 3 weeks ago
- Implement some method of LLM KV Cache Sparsity☆30Updated 7 months ago
- Course Material for the UG Course COMP4901Y☆50Updated 8 months ago