andreyanufr / who_what_benchmark
☆20Updated 9 months ago
Alternatives and similar repositories for who_what_benchmark:
Users that are interested in who_what_benchmark are comparing it to the libraries listed below
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆460Updated this week
- Tools for easier OpenVINO development/debugging☆9Updated last month
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆295Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆262Updated 6 months ago
- Neural Network Compression Framework for enhanced OpenVINO™ inference☆1,002Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆350Updated 7 months ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆260Updated this week
- Easy and Efficient Quantization for Transformers☆197Updated 2 months ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆185Updated this week
- Software Development Kit (SDK) for the Intel® Geti™ platform for Computer Vision AI model training.☆83Updated last week
- Advanced Quantization Algorithm for LLMs/VLMs.☆438Updated this week
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆136Updated 9 months ago
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆162Updated 2 weeks ago
- OpenVINO Tokenizers extension☆32Updated this week
- Official implementation of Half-Quadratic Quantization (HQQ)☆791Updated this week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆1,251Updated this week
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆286Updated last week
- Examples for using ONNX Runtime for model training.☆332Updated 6 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆424Updated 3 months ago
- Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes☆382Updated 9 months ago
- ☆45Updated 3 years ago
- The Triton backend for the ONNX Runtime.☆140Updated last week
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.☆786Updated 2 months ago
- OpenVINO™ Explainable AI (XAI) Toolkit: Visual Explanation for OpenVINO Models☆32Updated last month
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆809Updated 7 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆67Updated this week
- Manage scalable open LLM inference endpoints in Slurm clusters☆254Updated 9 months ago
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆472Updated this week
- A set of scripts and notebooks on LLM finetunning and dataset creation☆106Updated 6 months ago
- A pytorch quantization backend for optimum☆922Updated last week