andreyanufr / who_what_benchmarkLinks
β21Updated last year
Alternatives and similar repositories for who_what_benchmark
Users that are interested in who_what_benchmark are comparing it to the libraries listed below
Sorting:
- π€ Optimum Intel: Accelerate inference with Intel optimization toolsβ522Updated this week
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β325Updated 3 months ago
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β203Updated last week
- An innovative library for efficient LLM inference via low-bit quantizationβ351Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated 3 weeks ago
- A pytorch quantization backend for optimumβ1,018Updated last month
- SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, β¦β2,558Updated this week
- Tools for easier OpenVINO development/debuggingβ10Updated 5 months ago
- This repository contains tutorials and examples for Triton Inference Serverβ813Updated 3 weeks ago
- Neural Network Compression Framework for enhanced OpenVINOβ’ inferenceβ1,112Updated this week
- Easy and Efficient Quantization for Transformersβ202Updated 6 months ago
- A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.β377Updated 5 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)β903Updated last week
- Prune a model while finetuning or training.β405Updated 3 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ85Updated last week
- Advanced quantization toolkit for LLMs and VLMs. Support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Traβ¦β785Updated this week
- Efficiently find the best-suited language model (LM) for your NLP taskβ132Updated 5 months ago
- Reference models for Intel(R) Gaudi(R) AI Acceleratorβ169Updated 3 months ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.β833Updated 4 months ago
- The Triton TensorRT-LLM Backendβ910Updated last week
- β20Updated 7 months ago
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Servβ¦β501Updated last week
- β322Updated last week
- A set of scripts and notebooks on LLM finetunning and dataset creationβ113Updated last year
- Examples for using ONNX Runtime for model training.β358Updated last year
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferenβ¦β73Updated 3 weeks ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β217Updated 2 weeks ago
- A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresseβ¦β1,720Updated this week
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.β962Updated last year
- Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.β510Updated 8 months ago