friendliai / LLMServingPerfEvaluatorLinks

☆48

Alternatives and similar repositories for LLMServingPerfEvaluator

Users that are interested in LLMServingPerfEvaluator are comparing it to the libraries listed below

Sorting:

swsnu / aisys2023
☆103Updated 2 years ago
friendliai / friendli-model-optimizer
FMO (Friendli Model Optimizer)
☆13Updated 9 months ago
friendliai / periflow-cli
Welcome to PeriFlow CLI ☁︎
☆12Updated 2 years ago
kakaobrain / trident
A performance library for machine learning applications.
☆184Updated 2 years ago
mlsys-seo / ooo-backprop
☆25Updated 2 years ago
friendliai / FAI-Model
FriendliAI Model Hub
☆91Updated 3 years ago
PyTorchKorea / pytorchcore-kr
PyTorch CoreSIG
☆57Updated 10 months ago
junstar92 / nvidia-libraries-study
☆54Updated 11 months ago
VIA-Research / vTrain
☆73Updated 5 months ago
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆202Updated 4 months ago
efficient-ai-study / efficient-ai-study
☆91Updated last year
HabanaAI / vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
☆84Updated this week
SqueezeBits / QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆118Updated last year
kaist-ina / stellatrain
Official Github repository for the SIGCOMM '24 paper "Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs"
☆72Updated last year
sjquan / 2022-Study
☆56Updated 2 years ago
triton-inference-server / vllm_backend
☆304Updated this week
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆431Updated 2 weeks ago
etri / nest-compiler
NEST Compiler
☆118Updated 8 months ago
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆432Updated 5 months ago
ConstantPark / DL_Compiler
Study Group of Deep Learning Compiler
☆165Updated 2 years ago
cli99 / llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
☆461Updated 6 months ago
SqueezeBits / owlite
OwLite is a low-code AI model compression toolkit for AI models.
☆50Updated 5 months ago
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆288Updated this week
Sys-KU / DeepPlan
[ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
☆57Updated 2 months ago
42dot / 42dot_LLM
42dot LLM consists of a pre-trained language model, 42dot LLM-PLM, and a fine-tuned model, 42dot LLM-SFT, which is trained to respond to …
☆130Updated last year
likejazz / llama3.cuda
llama3.cuda is a pure C/CUDA implementation for Llama 3 model.
☆344Updated 6 months ago
run-ai / llmperf
☆58Updated last year
UpstageAI / evalverse
The Universe of Evaluation. All about the evaluation for LLMs.
☆228Updated last year
snuspl / pluto
MIST: High-performance IoT Stream Processing
☆17Updated 6 years ago
vllm-project / tpu-inference
TPU inference for vLLM, with unified JAX and PyTorch support.
☆138Updated this week