A high-throughput and memory-efficient inference and serving engine for LLMs
☆85Mar 3, 2026Updated this week
Alternatives and similar repositories for vllm-fork
Users that are interested in vllm-fork are comparing it to the libraries listed below
Sorting:
- ☆17Feb 3, 2026Updated last month
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆14Jan 8, 2026Updated last month
- Large Language Model Text Generation Inference on Habana Gaudi☆34Mar 20, 2025Updated 11 months ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆207Feb 23, 2026Updated last week
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆25Apr 15, 2025Updated 10 months ago
- SynapseAI Core is a reference implementation of the SynapseAI API running on Habana Gaudi☆42Feb 3, 2025Updated last year
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆170Jan 8, 2026Updated last month
- ☆24Oct 9, 2025Updated 4 months ago
- SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs☆67Updated this week
- PM Workshop China☆10Apr 11, 2019Updated 6 years ago
- ☆78Updated this week
- Cloud Native Benchmarking of Foundation Models☆45Jul 31, 2025Updated 7 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆65Jun 30, 2025Updated 8 months ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆18Dec 19, 2024Updated last year
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆120Mar 6, 2024Updated last year
- TPU inference for vLLM, with unified JAX and PyTorch support.☆247Updated this week
- Example scripts and configuration files to install and configure IBM Storage Scale in a Vagrant environment☆27Jan 6, 2026Updated last month
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆94Updated this week
- Setup and Installation Instructions for Habana binaries, docker image creation☆28Jan 8, 2026Updated last month
- Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open…☆723Updated this week
- ☆61Dec 18, 2024Updated last year
- The kernel module management operator builds, signs and loads kernel modules on OpenShift.☆31Feb 19, 2026Updated last week
- Create and deploy virtual-experiments - co-processing computational workflows☆10Jan 28, 2026Updated last month
- QJL: 1-Bit Quantized JL transform for KV Cache Quantization with Zero Overhead☆32Jan 27, 2025Updated last year
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆219Feb 16, 2026Updated 2 weeks ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆78Apr 6, 2024Updated last year
- vLLM performance dashboard☆42Apr 26, 2024Updated last year
- PARADIS, a lightweight and flexible weather forecast model that tries to Keep It Simple.☆26Feb 4, 2026Updated last month
- Explainable AI Tooling (XAI). XAI is used to discover and explain a model's prediction in a way that is interpretable to the user. Releva…☆39Sep 22, 2025Updated 5 months ago
- ext_mpi_collectives☆11Apr 1, 2025Updated 11 months ago
- Memory Topology for GPUs☆17Feb 13, 2026Updated 2 weeks ago
- ☆39Oct 3, 2022Updated 3 years ago
- A benchmark suite for measuring HDF5 performance.☆43Feb 24, 2026Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆114Updated this week
- Argonne Leadership Computing Facility OpenCL tutorial☆10Aug 22, 2025Updated 6 months ago
- ☆10Feb 25, 2026Updated last week
- Performance Counter Reader☆11Sep 14, 2022Updated 3 years ago
- DigitalOcean Gradient AI Platform SDK☆16Feb 24, 2026Updated last week
- User Management Application build with Spring Boot, Thymeleaf & MySQL Database☆12Dec 20, 2024Updated last year