andreyanufr / who_what_benchmarkLinks
β20Updated last year
Alternatives and similar repositories for who_what_benchmark
Users that are interested in who_what_benchmark are comparing it to the libraries listed below
Sorting:
- π€ Optimum Intel: Accelerate inference with Intel optimization toolsβ515Updated this week
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β320Updated 2 months ago
- Neural Network Compression Framework for enhanced OpenVINOβ’ inferenceβ1,109Updated this week
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β201Updated this week
- Prune a model while finetuning or training.β404Updated 3 years ago
- An innovative library for efficient LLM inference via low-bit quantizationβ350Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)β897Updated last month
- Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipesβ388Updated 6 months ago
- A pytorch quantization backend for optimumβ1,012Updated 2 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated last year
- Advanced quantization toolkit for LLMs and VLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with β¦β735Updated this week
- A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresseβ¦β1,605Updated this week
- SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, β¦β2,537Updated this week
- This repository contains tutorials and examples for Triton Inference Serverβ805Updated 3 weeks ago
- Easy and Efficient Quantization for Transformersβ203Updated 5 months ago
- Explainable AI Tooling (XAI). XAI is used to discover and explain a model's prediction in a way that is interpretable to the user. Relevaβ¦β38Updated 2 months ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.β830Updated 3 months ago
- ML model optimization product to accelerate inference.β324Updated 6 months ago
- A library for researching neural networks compression and acceleration methods.β140Updated 3 months ago
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantizationβ710Updated last year
- The Triton TensorRT-LLM Backendβ910Updated last week
- β319Updated last week
- Open-source library for scalable, reproducible evaluation of AI models and benchmarks.β106Updated this week
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Servβ¦β500Updated this week
- Official PyTorch implementation of QA-LoRAβ145Updated last year
- Top-level directory for documentation and general contentβ121Updated 6 months ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLMβ2,296Updated last week
- Tools for easier OpenVINO development/debuggingβ10Updated 4 months ago
- A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.β375Updated 5 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β355Updated 9 months ago