vllm-project / vllm-openvinoLinks
☆20Updated last month
Alternatives and similar repositories for vllm-openvino
Users that are interested in vllm-openvino are comparing it to the libraries listed below
Sorting:
- This repository contains Dockerfiles, scripts, yaml files, Helm charts, etc. used to scale out AI containers with versions of TensorFlow …☆48Updated last week
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆303Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆85Updated this week
- OpenVINO Tokenizers extension☆37Updated this week
- OpenVINO Intel NPU Compiler☆60Updated this week
- Large Language Model Text Generation Inference on Habana Gaudi☆34Updated 3 months ago
- A curated list of OpenVINO based AI projects☆141Updated 2 weeks ago
- High-speed and easy-use LLM serving framework for local deployment☆112Updated 3 months ago
- OpenAI Triton backend for Intel® GPUs☆193Updated this week
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆160Updated this week
- AMD related optimizations for transformer models☆80Updated 3 weeks ago
- Fast and memory-efficient exact attention☆80Updated last week
- ☆48Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated 2 weeks ago
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆251Updated 2 weeks ago
- ☆84Updated last month
- DeepSeek-V3/R1 inference performance simulator☆155Updated 3 months ago
- Library for modelling performance costs of different Neural Network workloads on NPU devices☆34Updated last month
- A GPU-driven system framework for scalable AI applications☆117Updated 5 months ago
- ☆161Updated last week
- ☆428Updated last week
- RISCV C and Triton AI-Benchmark☆19Updated 8 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆43Updated 4 months ago
- PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation☆30Updated 8 months ago
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆45Updated 2 months ago
- ☆80Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆77Updated this week
- ☆31Updated 5 months ago
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆48Updated 2 months ago
- This repo contains documents of the OPEA project☆42Updated last week