andreyanufr / who_what_benchmark
β20Updated 8 months ago
Alternatives and similar repositories for who_what_benchmark:
Users that are interested in who_what_benchmark are comparing it to the libraries listed below
- π€ Optimum Intel: Accelerate inference with Intel optimization toolsβ444Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ260Updated 4 months ago
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β290Updated last month
- Neural Network Compression Framework for enhanced OpenVINOβ’ inferenceβ978Updated this week
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β173Updated this week
- Easy and Efficient Quantization for Transformersβ192Updated 3 weeks ago
- experiments with inference on llamaβ104Updated 8 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)β760Updated last week
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β136Updated 7 months ago
- An innovative library for efficient LLM inference via low-bit quantizationβ351Updated 6 months ago
- For releasing code related to compression methods for transformers, accompanying our publicationsβ409Updated last month
- Advanced Quantization Algorithm for LLMs/VLMs.β379Updated this week
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β218Updated 2 weeks ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Seβ¦β559Updated last week
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.β742Updated 6 months ago
- OpenVINOβ’ Explainable AI (XAI) Toolkit: Visual Explanation for OpenVINO Modelsβ32Updated 5 months ago
- A pytorch quantization backend for optimumβ891Updated last month
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantizationβ679Updated 6 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ56Updated this week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLMβ1,032Updated this week
- Prune a model while finetuning or training.β399Updated 2 years ago
- A throughput-oriented high-performance serving framework for LLMsβ745Updated 5 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ205Updated this week
- A library for researching neural networks compression and acceleration methods.β140Updated 6 months ago
- Official PyTorch implementation of QA-LoRAβ127Updated 11 months ago
- LLM KV cache compression made easyβ412Updated last week
- This repository contains tutorials and examples for Triton Inference Serverβ656Updated this week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β188Updated this week
- Latency and Memory Analysis of Transformer Models for Training and Inferenceβ392Updated this week