Muhtasham / llm-inference-simulatorLinks
π LLM inference optimization simulator, modeling compute-bound prefill and memory-bound decode phases.
β13Updated 6 months ago
Alternatives and similar repositories for llm-inference-simulator
Users that are interested in llm-inference-simulator are comparing it to the libraries listed below
Sorting:
- vLLM adapter for a TGIS-compatible gRPC server.β51Updated this week
- β47Updated 9 months ago
- The backend behind the LLM-Perf Leaderboardβ11Updated last year
- β28Updated this week
- Estimating hardware and cloud costs of LLMs and transformer projectsβ20Updated 3 weeks ago
- β102Updated last year
- Benchmark suite for LLMs from Fireworks.aiβ89Updated this week
- β31Updated 9 months ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for trainingβ18Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β32Updated 4 months ago
- Easy, Fast, and Scalable Multimodal AIβ109Updated this week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decodingβ141Updated last year
- IBM development fork of https://github.com/huggingface/text-generation-inferenceβ63Updated 4 months ago
- Repository for CPU Kernel Generation for LLM Inferenceβ28Updated 2 years ago
- Nsight Systems In Dockerβ21Updated 2 years ago
- LLM Serving Performance Evaluation Harnessβ83Updated 11 months ago
- Compression for Foundation Modelsβ35Updated 6 months ago
- β20Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundryβ42Updated 2 years ago
- β47Updated last year
- Evaluate state-of-the-art sparse embedding models on the LIMIT dataset (`limit-small` and `limit`) from google's paper `On the Theoreticaβ¦β15Updated 5 months ago
- A curated list for Efficient Large Language Modelsβ11Updated last year
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Promptsβ40Updated last year
- A collection of reproducible inference engine benchmarksβ38Updated 9 months ago
- AskIt: Unified programming interface for programming with LLMs (GPT-3.5, GPT-4, Gemini, Claude, Cohere, Llama 2)β80Updated last year
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IPβ141Updated 5 months ago
- Evaluation of bm42 sparse indexing algorithmβ72Updated last year
- β71Updated 10 months ago
- Tutorial to get started with SkyPilot!β58Updated last year
- Manages vllm-nccl dependencyβ17Updated last year