andreyanufr / who_what_benchmarkLinks

☆20

Alternatives and similar repositories for who_what_benchmark

Users that are interested in who_what_benchmark are comparing it to the libraries listed below

Sorting:

huggingface / optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
☆515Updated this week
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆320Updated 2 months ago
openvinotoolkit / nncf
Neural Network Compression Framework for enhanced OpenVINO™ inference
☆1,109Updated this week
huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆201Updated this week
huggingface / nn_pruning
Prune a model while finetuning or training.
☆404Updated 3 years ago
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆350Updated last year
dropbox / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆897Updated last month
neuralmagic / sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
☆388Updated 6 months ago
huggingface / optimum-quanto
A pytorch quantization backend for optimum
☆1,012Updated 2 weeks ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
intel / auto-round
Advanced quantization toolkit for LLMs and VLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with …
☆735Updated this week
NVIDIA / TensorRT-Model-Optimizer
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresse…
☆1,605Updated this week
intel / neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, …
☆2,537Updated this week
triton-inference-server / tutorials
This repository contains tutorials and examples for Triton Inference Server
☆805Updated 3 weeks ago
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆203Updated 5 months ago
intel / intel-xai-tools
Explainable AI Tooling (XAI). XAI is used to discover and explain a model's prediction in a way that is interpretable to the user. Releva…
☆38Updated 2 months ago
triton-inference-server / pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
☆830Updated 3 months ago
neuralmagic / sparsify
ML model optimization product to accelerate inference.
☆324Updated 6 months ago
IntelLabs / Model-Compression-Research-Package
A library for researching neural networks compression and acceleration methods.
☆140Updated 3 months ago
SqueezeAILab / SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆710Updated last year
triton-inference-server / tensorrtllm_backend
The Triton TensorRT-LLM Backend
☆910Updated last week
triton-inference-server / vllm_backend
☆319Updated last week
NVIDIA-NeMo / Evaluator
Open-source library for scalable, reproducible evaluation of AI models and benchmarks.
☆106Updated this week
triton-inference-server / model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…
☆500Updated this week
yuhuixu1993 / qa-lora
Official PyTorch implementation of QA-LoRA
☆145Updated last year
neuralmagic / docs
Top-level directory for documentation and general content
☆121Updated 6 months ago
vllm-project / llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆2,296Updated last week
slyalin / openvino_devtools
Tools for easier OpenVINO development/debugging
☆10Updated 4 months ago
NVIDIA / logits-processor-zoo
A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.
☆375Updated 5 months ago
facebookresearch / SpinQuant
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆355Updated 9 months ago