Muhtasham / llm-inference-simulatorLinks
🚀 LLM inference optimization simulator, modeling compute-bound prefill and memory-bound decode phases.
☆12Updated 3 weeks ago
Alternatives and similar repositories for llm-inference-simulator
Users that are interested in llm-inference-simulator are comparing it to the libraries listed below
Sorting:
- vLLM adapter for a TGIS-compatible gRPC server.☆34Updated this week
- Make triton easier☆47Updated last year
- ☆40Updated 3 months ago
- Estimating hardware and cloud costs of LLMs and transformer projects☆18Updated last month
- 👷 Build compute kernels☆93Updated this week
- The backend behind the LLM-Perf Leaderboard☆10Updated last year
- python package of rocm-smi-lib☆22Updated 3 weeks ago
- Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
- A curated list for Efficient Large Language Models☆11Updated last year
- Example of applying CUDA graphs to LLaMA-v2☆12Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 4 months ago
- [WIP] Better (FP8) attention for Hopper☆32Updated 5 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆61Updated 3 months ago
- Standalone commandline CLI tool for compiling Triton kernels☆17Updated 10 months ago
- A domain-specific language (DSL) based on Triton but providing higher-level abstractions.☆24Updated last week
- Benchmark suite for LLMs from Fireworks.ai☆77Updated last week
- The driver for LMCache core to run in vLLM☆45Updated 6 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆99Updated 3 weeks ago
- Manage ML configuration with pydantic☆16Updated 2 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆109Updated 9 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆123Updated 8 months ago
- AskIt: Unified programming interface for programming with LLMs (GPT-3.5, GPT-4, Gemini, Claude, Cohere, Llama 2)☆79Updated 7 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆80Updated 11 months ago
- QuIP quantization☆55Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆50Updated this week
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 6 months ago
- Official implementation for Training LLMs with MXFP4☆55Updated 3 months ago
- ☆74Updated 4 months ago
- ☆28Updated 2 years ago
- The official evaluation suite and dynamic data release for MixEval.☆11Updated 10 months ago