vllm-project / vllm-openvinoLinks
☆15Updated 3 weeks ago
Alternatives and similar repositories for vllm-openvino
Users that are interested in vllm-openvino are comparing it to the libraries listed below
Sorting:
- Tools for easier OpenVINO development/debugging☆9Updated 3 months ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆295Updated this week
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆473Updated last week
- OpenVINO Tokenizers extension☆36Updated last week
- ☆113Updated 2 months ago
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…☆525Updated this week
- GenAI components at micro-service level; GenAI service composer to create mega-service☆160Updated this week
- llm-export can export llm model to onnx.☆295Updated 5 months ago
- ☆427Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆76Updated this week
- Repository for OpenVINO's extra modules☆129Updated this week
- ☆16Updated last year
- OpenVINO Intel NPU Compiler☆58Updated last week
- Pre-built components and code samples to help you build and deploy production-grade AI applications with the OpenVINO™ Toolkit from Intel☆155Updated this week
- Large Language Model Text Generation Inference on Habana Gaudi☆33Updated 3 months ago
- Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysi…☆228Updated 3 weeks ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆304Updated 3 weeks ago
- ONNX Runtime: cross-platform, high performance scoring engine for ML models☆65Updated this week
- Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLa…☆633Updated this week
- Intel® NPU Acceleration Library☆680Updated 2 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated 9 months ago
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆377Updated this week
- A curated list of OpenVINO based AI projects☆138Updated 2 weeks ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆1,518Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆84Updated this week
- Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Inte…☆718Updated this week
- ☆81Updated last week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆188Updated this week
- OpenVINO™ Explainable AI (XAI) Toolkit: Visual Explanation for OpenVINO Models☆32Updated 3 months ago
- With OpenVINO Test Drive, users can run large language models (LLMs) and models trained by Intel Geti on their devices, including AI PCs …☆26Updated 2 weeks ago