tensorchord / inference-benchmarkLinks

Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)

☆28

Alternatives and similar repositories for inference-benchmark

Users that are interested in inference-benchmark are comparing it to the libraries listed below

Sorting:

Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆32Updated 3 months ago
ryantd / veloce
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
☆18Updated 3 years ago
zhisbug / ray-scalable-ml-design
Some microbenchmarks and design docs before commencement
☆12Updated 4 years ago
npuichigo / blazing-fast-io-tutorial
Blazing fast data loading with HuggingFace Dataset and Ray Data
☆16Updated last year
withmartian / leaderboard-backend
Open sourced backend for Martian's LLM Inference Provider Leaderboard
☆19Updated 11 months ago
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆76Updated this week
anyscale / llm-continuous-batching-benchmarks
☆120Updated last year
vllm-project / vllm-nccl
Manages vllm-nccl dependency
☆17Updated last year
hpcaitech / CachedEmbedding
A memory efficient DLRM training solution using ColossalAI
☆105Updated 2 years ago
bentoml / simple_di
Simple dependency injection framework for Python
☆21Updated last year
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆43Updated 7 months ago
microsoft / batch-inference
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
☆101Updated 11 months ago
opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆33Updated this week
UmerHA / triton_util
Make triton easier
☆47Updated last year
bentoml / sentence-embedding-bento
Sentence Embedding as a Service
☆15Updated last month
bentoml / llm-bench
☆55Updated 8 months ago
simon-mo / vLLM-Benchmark
☆30Updated 3 months ago
Snowflake-Labs / vllm
☆15Updated 4 months ago
vllm-project / recipes
Common recipes to run vLLM
☆68Updated last week
xlang-ai / batch-prompting
[EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.
☆75Updated last year
tanyuqian / redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
☆66Updated 7 months ago
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆56Updated last week
facebookresearch / fastgen
Simple high-throughput inference library
☆125Updated 2 months ago
deepspeedai / DeepSpeed-Kernels
☆74Updated 4 months ago
triton-inference-server / pytorch_backend
The Triton backend for the PyTorch TorchScript models.
☆157Updated 2 weeks ago
coreweave / ml-containers
☆38Updated this week
ray-project / distml
Distributed ML Optimizer
☆32Updated 4 years ago
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆71Updated last year
pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆158Updated last month
ModelTC / awesome-lm-system
Summary of system papers/frameworks/codes/tools on training or serving large model
☆57Updated last year