A high-throughput and memory-efficient inference and serving engine for LLMs
☆85Mar 18, 2026Updated last week
Alternatives and similar repositories for vllm-fork
Users that are interested in vllm-fork are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆17Mar 4, 2026Updated 3 weeks ago
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆14Jan 8, 2026Updated 2 months ago
- Large Language Model Text Generation Inference on Habana Gaudi☆34Mar 20, 2025Updated last year
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆170Jan 8, 2026Updated 2 months ago
- SynapseAI Core is a reference implementation of the SynapseAI API running on Habana Gaudi☆42Feb 3, 2025Updated last year
- PM Workshop China☆10Apr 11, 2019Updated 6 years ago
- GenAI components at micro-service level; GenAI service composer to create mega-service☆195Updated this week
- ☆24Oct 9, 2025Updated 5 months ago
- ☆80Mar 18, 2026Updated last week
- Setup and Installation Instructions for Habana binaries, docker image creation☆28Jan 8, 2026Updated 2 months ago
- ☆10Jul 31, 2025Updated 7 months ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆18Dec 19, 2024Updated last year
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆120Mar 6, 2024Updated 2 years ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆65Jun 30, 2025Updated 8 months ago
- Cloud Native Benchmarking of Foundation Models☆45Jul 31, 2025Updated 7 months ago
- TPU inference for vLLM, with unified JAX and PyTorch support.☆266Updated this week
- OpenVINO LLM Benchmark☆11Dec 7, 2023Updated 2 years ago
- Official Code Repository for the paper "KALA: Knowledge-Augmented Language Model Adaptation" (NAACL 2022)☆35Oct 17, 2023Updated 2 years ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆94Updated this week
- ☆20Mar 3, 2026Updated 3 weeks ago
- Intel Graphics System Firmware Update Library (IGSC FUL) is a pure C low level library that exposes a required API to perform a firmware …☆77Jan 15, 2026Updated 2 months ago
- AI-agent应用,基于GPT、langchain、function calling、Stable diffusion等的AI儿童绘本生成☆25Oct 11, 2023Updated 2 years ago
- ☆19Jul 24, 2025Updated 8 months ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆38Aug 29, 2025Updated 6 months ago
- ☆30Aug 31, 2022Updated 3 years ago
- ☆17Feb 3, 2026Updated last month
- Mini-Engine Demonstration of Combining XeSS with VRS Tier 2.☆14Jan 26, 2026Updated last month
- vLLM performance dashboard☆43Apr 26, 2024Updated last year
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆220Mar 16, 2026Updated last week
- Benchmark Suite Invocation Scripting☆11Mar 16, 2022Updated 4 years ago
- ☆158Mar 12, 2026Updated last week
- Personal solutions to the Triton Puzzles☆20Jul 18, 2024Updated last year
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆553Updated this week
- Building LaTeX packages using Travis-CI☆12Dec 21, 2019Updated 6 years ago
- SPDK fork of nvme-cli. No longer supported - use standard nvme-cli with SPDK nvme CUSE instead. See https://spdk.io/doc/nvme.html#nvme_…☆15Apr 10, 2024Updated last year
- Experimental projects related to TensorRT☆121Updated this week
- Official code for "Federated learning for heterogeneous electronic health record systems with cost effective participant selection"☆12Feb 11, 2026Updated last month
- ☆15Mar 17, 2026Updated last week
- ☆60Dec 18, 2024Updated last year