andreyanufr / who_what_benchmark
☆20Updated 2 months ago
Related projects: ⓘ
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆380Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆144Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆231Updated last week
- Neural Network Compression Framework for enhanced OpenVINO™ inference☆906Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆250Updated this week
- Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for t…☆205Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆342Updated 2 weeks ago
- Run Generative AI models using native OpenVINO C++ API☆107Updated this week
- A pytorch quantization backend for optimum☆758Updated this week
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆152Updated this week
- Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes☆364Updated 2 months ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.☆715Updated last month
- Software Development Kit (SDK) for the Intel® Geti™ platform for Computer Vision AI model training.☆71Updated this week
- OpenVINO™ Explainable AI (XAI) Toolkit: Visual Explanation for OpenVINO Models☆20Updated this week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆416Updated this week
- GenAI components at micro-service level; GenAI service composer to create mega-service☆46Updated this week
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆129Updated last month
- experiments with inference on llama☆106Updated 3 months ago
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime☆319Updated this week
- ☆276Updated 3 weeks ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆248Updated 10 months ago
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆419Updated last week
- The Triton backend for the ONNX Runtime.☆122Updated this week
- TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, sparsity, distillat…☆439Updated this week
- This repository contains tutorials and examples for Triton Inference Server☆527Updated this week
- ML model optimization product to accelerate inference.☆318Updated 5 months ago
- PyTorch native quantization and sparsity for training and inference☆748Updated this week
- A curated list of OpenVINO based AI projects☆92Updated 3 weeks ago
- Accelerate PyTorch models with ONNX Runtime☆353Updated 2 weeks ago
- Easy and Efficient Quantization for Transformers☆172Updated 2 months ago