andreyanufr / who_what_benchmarkLinks
β20Updated 11 months ago
Alternatives and similar repositories for who_what_benchmark
Users that are interested in who_what_benchmark are comparing it to the libraries listed below
Sorting:
- π€ Optimum Intel: Accelerate inference with Intel optimization toolsβ473Updated this week
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β304Updated 3 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ264Updated 8 months ago
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β137Updated 10 months ago
- An innovative library for efficient LLM inference via low-bit quantizationβ349Updated 9 months ago
- Easy and Efficient Quantization for Transformersβ199Updated 4 months ago
- A pytorch quantization backend for optimumβ955Updated this week
- experiments with inference on llamaβ104Updated last year
- Tools for easier OpenVINO development/debuggingβ9Updated 3 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β288Updated 4 months ago
- Official PyTorch implementation of QA-LoRAβ137Updated last year
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β188Updated this week
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Modelsβ238Updated last year
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantizationβ692Updated 10 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)β832Updated this week
- β267Updated last week
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"β371Updated last year
- A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.β297Updated this week
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Traβ¦β499Updated this week
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.β845Updated 9 months ago
- Inference server benchmarking toolβ73Updated last month
- OpenVINOβ’ Explainable AI (XAI) Toolkit: Visual Explanation for OpenVINO Modelsβ32Updated 3 months ago
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.β395Updated 6 months ago
- Google TPU optimizations for transformers modelsβ113Updated 5 months ago
- A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. β¦β995Updated this week
- For releasing code related to compression methods for transformers, accompanying our publicationsβ431Updated 5 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β311Updated last month
- Let's build better datasets, together!β259Updated 6 months ago
- The repository for the code of the UltraFastBERT paperβ516Updated last year
- Fast low-bit matmul kernels in Tritonβ322Updated this week