KuntaiDu / vllmView external linksLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆13Feb 6, 2026Updated last week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- The driver for LMCache core to run in vLLM☆60Feb 4, 2025Updated last year
- [ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression☆32Aug 7, 2025Updated 6 months ago
- Active Learning with Partial Feedback, ICLR 2019☆11Apr 27, 2020Updated 5 years ago
- ☆12Mar 31, 2021Updated 4 years ago
- ☆12Jul 24, 2024Updated last year
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- This repository contains the code used in a publication 'Active Learning for Decision-Making from Imbalanced Observational Data', Iiris S…☆11May 14, 2019Updated 6 years ago
- 华为集合通信性能测试☆15May 27, 2024Updated last year
- 机器学习实验 - 线性回归 - 预测连续值☆11Aug 11, 2017Updated 8 years ago
- ☆13Jan 7, 2025Updated last year
- A Distributed Analysis and Benchmarking Framework for Apache OpenWhisk Serverless Platform☆12Dec 11, 2018Updated 7 years ago
- Compress BiSeNet with Structure Knowledge Distillation for Real-time image segmentation on wali-TX2☆11Jul 29, 2020Updated 5 years ago
- A simple tool for parsing the profile.json file of mxnet☆14Aug 1, 2018Updated 7 years ago
- Crawl phone information from Taobao and JD, clean those raw data. Use those data to analyze and compare prices of different phone models.…☆13Apr 15, 2020Updated 5 years ago
- The audio player for Flutter with a heart of gold☆13May 13, 2023Updated 2 years ago
- ☆12Jun 3, 2019Updated 6 years ago
- Source Code for Partial Interference☆10Dec 17, 2022Updated 3 years ago
- Simple starter CMake project that uses NVBench.☆15May 6, 2025Updated 9 months ago
- Pytorch--使用伪标签训练efficientNet模型☆11Dec 28, 2019Updated 6 years ago
- Practical example using python to train a decision tree☆11Jul 27, 2016Updated 9 years ago
- draw object rect and add some properties☆11May 28, 2018Updated 7 years ago
- CUDA C simple application for Nvidia's GPU☆11Jun 7, 2022Updated 3 years ago
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 12 years ago
- Example of applying CUDA graphs to LLaMA-v2☆12Aug 25, 2023Updated 2 years ago
- See vLLM official support: https://github.com/vllm-project/vllm-ascend☆11Feb 5, 2025Updated last year
- Cluster management tools for the Hydro stack☆18Feb 5, 2021Updated 5 years ago
- Inline PTX Assembly in CUDA example☆13May 7, 2022Updated 3 years ago
- A notebook showing how to easily convert a current notebook you have to a notebook that can be run on Kubeflow Pipelines.☆15Jul 15, 2020Updated 5 years ago
- A simple LaTeX template for CUHK thesis.☆13Apr 24, 2023Updated 2 years ago
- [MLSys 2023] Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models☆16May 5, 2023Updated 2 years ago
- ☆16May 19, 2025Updated 8 months ago
- LLM serving cluster simulator☆135Apr 25, 2024Updated last year
- Python bindings for OpenSHMEM☆25Jan 13, 2026Updated last month
- ☆11Sep 7, 2024Updated last year
- A list of useful stuff in Machine Learning, Computer Graphics, Software Development, ...☆17Nov 14, 2022Updated 3 years ago
- Zoom in Lesions for Better Diagnosis: Attention Guided Deformation Network for WCE Image Classification☆13Aug 4, 2020Updated 5 years ago
- The code for paper 'Hierarchical Policy for Non-prehensile Multi-object Rearrangement with Deep Reinforcement Learning and Monte Carlo Tr…☆21Aug 18, 2023Updated 2 years ago
- Baidu Hook☆13Jan 7, 2016Updated 10 years ago
- Python utility to convert PyTorch model weights from '.bin' to '.safetensors' format.☆17Sep 19, 2025Updated 4 months ago