Muhtasham / llm-inference-simulatorLinks

🚀 LLM inference optimization simulator, modeling compute-bound prefill and memory-bound decode phases.

☆12

Alternatives and similar repositories for llm-inference-simulator

Users that are interested in llm-inference-simulator are comparing it to the libraries listed below

Sorting:

opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆41Updated this week
UmerHA / triton_util
Make triton easier
☆47Updated last year
isEmmanuelOlowe / llm-cost-estimator
Estimating hardware and cloud costs of LLMs and transformer projects
☆18Updated 3 months ago
GATECH-EIC / ShiftAddLLM
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
☆110Updated 11 months ago
zenrran4nlp / Awesome-LLM-Inference-Serving
☆43Updated 5 months ago
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 6 months ago
merrymercy / Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
☆11Updated last year
frinkleko / LIMIT-Sparse-Embedding
Evaluate state-of-the-art sparse embedding models on the LIMIT dataset (`limit-small` and `limit`) from google's paper `On the Theoretica…
☆15Updated last month
Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆33Updated 5 months ago
logikon-ai / cot-eval
A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
☆18Updated 8 months ago
WaveSpeedAI / QuantumAttention
[WIP] Better (FP8) attention for Hopper
☆33Updated 7 months ago
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆83Updated last week
IlyasMoutawwakil / llm-perf-backend
The backend behind the LLM-Perf Leaderboard
☆10Updated last year
NolanoOrg / SpectraSuite
☆51Updated last year
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆128Updated 10 months ago
InfiniTensor / ninetoothed
A domain-specific language (DSL) based on Triton but providing higher-level abstractions.
☆30Updated last week
IlyasMoutawwakil / py-txi
A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.
☆33Updated 2 weeks ago
facebookresearch / DIG-In
This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.
☆20Updated last year
IST-DASLab / Quartet
☆98Updated last month
SqueezeAILab / open_source_projects
Open Source Projects from Pallas Lab
☆21Updated 3 years ago
PrimeIntellect-ai / pccl
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆123Updated 3 weeks ago
leimao / Nsight-Systems-Docker-Image
Nsight Systems In Docker
☆20Updated last year
michaelfeil / candle-flash-attn-v3
☆12Updated 8 months ago
nexusflowai / NexusBench
Nexusflow function call, tool use, and agent benchmarks.
☆29Updated 9 months ago
facebookresearch / lss_eval
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Updated 2 years ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆82Updated last year
facebookresearch / fastgen
Simple high-throughput inference library
☆139Updated 4 months ago
basetenlabs / Workshop-TRT-LLM
☆19Updated last year
allenai / olmo-cookbook
OLMost every training recipe you need to perform data interventions with the OLMo family of models.
☆49Updated last week