☆31Apr 19, 2025Updated 10 months ago
Alternatives and similar repositories for vLLM-Benchmark
Users that are interested in vLLM-Benchmark are comparing it to the libraries listed below
Sorting:
- A collection of reproducible inference engine benchmarks☆38Apr 22, 2025Updated 10 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆13Oct 10, 2025Updated 4 months ago
- CS294-162; Machine Learning Systems Seminar☆32Apr 11, 2023Updated 2 years ago
- ☆14Apr 8, 2023Updated 2 years ago
- ☆44Updated this week
- vLLM Daily Summarization of Merged PRs☆43Updated this week
- Benchmarking Optimizers for LLM Pretraining☆52Dec 30, 2025Updated 2 months ago
- ☆111Updated this week
- KV cache store for distributed LLM inference☆392Nov 13, 2025Updated 3 months ago
- A bunch of kernels that might make stuff slower 😉☆75Feb 18, 2026Updated last week
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- Scoreboard for ONNX Backend Compatibility☆29Jan 24, 2026Updated last month
- Revision of official yolov7-pose to support custom dataset for keypoint detection☆11Nov 12, 2023Updated 2 years ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆36Aug 29, 2025Updated 6 months ago
- ☆30Jan 26, 2023Updated 3 years ago
- Modded vLLM to run pipeline parallelism over public networks☆40May 20, 2025Updated 9 months ago
- Run Slurm as a Kubernetes scheduler. A Slinky project.☆66Updated this week
- Ship correct and fast LLM kernels to PyTorch☆142Jan 14, 2026Updated last month
- Multi-GPU communication profiler and visualizer☆38Jun 10, 2024Updated last year
- ☆53Updated this week
- Use yolov5 to realize the road occupation operation and vehicle parking violation detection in urban streets, and can independently delin…☆12Jan 2, 2023Updated 3 years ago
- ☆28Dec 3, 2025Updated 2 months ago
- A github action for detecting a "trigger" in a pull request description or comment☆13Jun 13, 2025Updated 8 months ago
- 详细双语注释版word2vec源码,well-annotated word2vec☆10Oct 3, 2021Updated 4 years ago
- ☆53Updated this week
- Practical exercises for HOW Series "Deep Dive", a Web-based training on parallel programming and performance optimization☆33Feb 1, 2019Updated 7 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 8 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 5 months ago
- Polyp segmentation tool utilizing U-Net for accurate medical image analysis, designed to enhance early detection and diagnosis of colorec…☆11Feb 18, 2024Updated 2 years ago
- Terraform modules and Ansible playbook for Apache SkyWalking☆12Mar 11, 2024Updated last year
- Transformer related optimization, including BERT, GPT☆39Feb 10, 2023Updated 3 years ago
- [CVPRW 2025] Official code of "IAUNet: Instance-Aware U-Net"☆33Aug 22, 2025Updated 6 months ago
- ☆11Aug 17, 2014Updated 11 years ago
- Offline optimization of your disaggregated Dynamo graph☆195Updated this week
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆58Aug 12, 2024Updated last year
- Pytorch routines for (Ker)nel (Mac)hines☆10Oct 10, 2025Updated 4 months ago
- A simple implementation of an artificial neural network based with Apache Spark and python. this is another implementation of my toy prog…☆11Jul 28, 2017Updated 8 years ago
- A simplified implementation inspired by Cline☆10Mar 11, 2025Updated 11 months ago
- ☆10Nov 25, 2022Updated 3 years ago