tensorchord / inference-benchmark
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
☆27Updated last year
Alternatives and similar repositories for inference-benchmark:
Users that are interested in inference-benchmark are comparing it to the libraries listed below
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- Some microbenchmarks and design docs before commencement☆12Updated 3 years ago
- Make triton easier☆42Updated 7 months ago
- Sentence Embedding as a Service☆14Updated last year
- ☆41Updated 2 months ago
- Distributed ML Optimizer☆30Updated 3 years ago
- Benchmark suite for LLMs from Fireworks.ai☆64Updated last month
- Open sourced backend for Martian's LLM Inference Provider Leaderboard☆17Updated 5 months ago
- Blazing fast data loading with HuggingFace Dataset and Ray Data☆15Updated last year
- Manages vllm-nccl dependency☆16Updated 7 months ago
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- TensorRT LLM Benchmark Configuration☆12Updated 5 months ago
- Trace LLM calls (and others) and visualize them in WandB, as interactive SVG or using a streaming local webapp☆14Updated last year
- Self-host LLMs with LMDeploy and BentoML☆17Updated 3 weeks ago
- ☆9Updated last year
- This repository contains statistics about the AI Infrastructure products.☆18Updated 6 months ago
- Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inferen…☆19Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆64Updated 4 months ago
- The backend behind the LLM-Perf Leaderboard☆10Updated 8 months ago
- setup the env for vllm users☆16Updated last year
- Train, tune, and infer Bamba model☆76Updated this week
- Framework to achieve context distillation in LLMs☆11Updated last year
- ☆114Updated 10 months ago
- Benchmarking PyTorch 2.0 different models☆21Updated last year
- Simple dependency injection framework for Python☆20Updated 8 months ago
- vLLM adapter for a TGIS-compatible gRPC server.☆15Updated this week
- The Efficiency Spectrum of LLM☆52Updated last year
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- ☆21Updated 2 months ago