andreyanufr / who_what_benchmark
β20Updated 10 months ago
Alternatives and similar repositories for who_what_benchmark
Users that are interested in who_what_benchmark are comparing it to the libraries listed below
Sorting:
- π€ Optimum Intel: Accelerate inference with Intel optimization toolsβ465Updated this week
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β301Updated this week
- An innovative library for efficient LLM inference via low-bit quantizationβ349Updated 8 months ago
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β186Updated this week
- Tools for easier OpenVINO development/debuggingβ9Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ263Updated 7 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ71Updated this week
- Advanced Quantization Algorithm for LLMs/VLMs.β460Updated this week
- Easy and Efficient Quantization for Transformersβ197Updated 3 months ago
- experiments with inference on llamaβ104Updated 11 months ago
- β255Updated last week
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtimeβ274Updated this week
- Neural Network Compression Framework for enhanced OpenVINOβ’ inferenceβ1,027Updated this week
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β136Updated 9 months ago
- A pytorch quantization backend for optimumβ935Updated 3 weeks ago
- Google TPU optimizations for transformers modelsβ109Updated 3 months ago
- A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.β283Updated last week
- Applied AI experiments and examples for PyTorchβ267Updated this week
- Official implementation of Half-Quadratic Quantization (HQQ)β810Updated this week
- Notes on quantization in neural networksβ82Updated last year
- nvidia-modelopt is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculatβ¦β922Updated this week
- Fast low-bit matmul kernels in Tritonβ301Updated this week
- Reference models for Intel(R) Gaudi(R) AI Acceleratorβ162Updated 2 weeks ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β195Updated this week
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β272Updated 3 months ago
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.β97Updated 9 months ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.β351Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ173Updated last week
- Pipeline Parallelism for PyTorchβ765Updated 8 months ago
- Inference server benchmarking toolβ59Updated 3 weeks ago