Muhtasham / llm-inference-simulatorLinks
π LLM inference optimization simulator, modeling compute-bound prefill and memory-bound decode phases.
β12Updated 2 months ago
Alternatives and similar repositories for llm-inference-simulator
Users that are interested in llm-inference-simulator are comparing it to the libraries listed below
Sorting:
- vLLM adapter for a TGIS-compatible gRPC server.β41Updated this week
- Make triton easierβ47Updated last year
- Estimating hardware and cloud costs of LLMs and transformer projectsβ18Updated 3 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterizationβ110Updated 11 months ago
- β43Updated 5 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)β66Updated 6 months ago
- A curated list for Efficient Large Language Modelsβ11Updated last year
- Evaluate state-of-the-art sparse embedding models on the LIMIT dataset (`limit-small` and `limit`) from google's paper `On the Theoreticaβ¦β15Updated last month
- A collection of reproducible inference engine benchmarksβ33Updated 5 months ago
- A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.β18Updated 8 months ago
- [WIP] Better (FP8) attention for Hopperβ33Updated 7 months ago
- Benchmark suite for LLMs from Fireworks.aiβ83Updated last week
- The backend behind the LLM-Perf Leaderboardβ10Updated last year
- β51Updated last year
- Repository for CPU Kernel Generation for LLM Inferenceβ26Updated 2 years ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decodingβ128Updated 10 months ago
- A domain-specific language (DSL) based on Triton but providing higher-level abstractions.β30Updated last week
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β33Updated 2 weeks ago
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.β20Updated last year
- β98Updated last month
- Open Source Projects from Pallas Labβ21Updated 3 years ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IPβ123Updated 3 weeks ago
- Nsight Systems In Dockerβ20Updated last year
- β12Updated 8 months ago
- Nexusflow function call, tool use, and agent benchmarks.β29Updated 9 months ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found heβ¦β31Updated 2 years ago
- Boosting 4-bit inference kernels with 2:4 Sparsityβ82Updated last year
- Simple high-throughput inference libraryβ139Updated 4 months ago
- β19Updated last year
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.β49Updated last week