lapp0 / lm-inference-enginesLinks

Comparison of Language Model Inference Engines

☆236

Alternatives and similar repositories for lm-inference-engines

Users that are interested in lm-inference-engines are comparing it to the libraries listed below

Sorting:

npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆218Updated last year
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆203Updated 5 months ago
triton-inference-server / vllm_backend
☆317Updated last week
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆320Updated 2 months ago
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆350Updated last year
IST-DASLab / marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆958Updated last year
apoorvumang / prompt-lookup-decoding
☆581Updated last year
vectorch-ai / ScaleLLM
A high-performance inference system for large language models, designed for production environments.
☆486Updated 3 weeks ago
ray-project / llmperf-leaderboard
☆473Updated last year
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆84Updated last week
punica-ai / punica
Serving multiple LoRA finetuned LLM as one
☆1,121Updated last year
huggingface / inference-benchmarker
Inference server benchmarking tool
☆130Updated 2 months ago
efeslab / Nanoflow
A throughput-oriented high-performance serving framework for LLMs
☆918Updated last month
Cornell-RelaxML / quip-sharp
☆570Updated last year
mlc-ai / llm-perf-bench
☆120Updated last year
neuralmagic / AutoFP8
☆205Updated 7 months ago
vllm-project / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆730Updated this week
Infini-AI-Lab / Sequoia
scalable and robust tree-based speculative decoding algorithm
☆363Updated 10 months ago
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆93Updated 9 months ago
fpgaminer / GPTQ-triton
GPTQ inference Triton kernel
☆315Updated 2 years ago
bentoml / llm-bench
☆56Updated last year
Dan-wanna-M / formatron
Formatron empowers everyone to control the format of language models' output with minimal overhead.
☆231Updated 5 months ago
run-ai / llmperf
☆58Updated last year
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆139Updated last year
microsoft / VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
☆668Updated 7 months ago
dropbox / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆894Updated last month
ray-project / llmperf
LLMPerf is a library for validating and benchmarking LLMs
☆1,057Updated 11 months ago
anyscale / llm-continuous-batching-benchmarks
☆122Updated last year
asprenger / ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
☆78Updated last year