andreyanufr / who_what_benchmark
☆20Updated 5 months ago
Alternatives and similar repositories for who_what_benchmark:
Users that are interested in who_what_benchmark are comparing it to the libraries listed below
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆422Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆162Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆254Updated 2 months ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆180Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆274Updated this week
- Neural Network Compression Framework for enhanced OpenVINO™ inference☆956Updated this week
- Notes on quantization in neural networks☆59Updated last year
- Easy and Efficient Quantization for Transformers☆185Updated last week
- experiments with inference on llama☆104Updated 6 months ago
- Advanced Quantization Algorithm for LLMs/VLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent …☆271Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆350Updated 3 months ago
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆156Updated this week
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆136Updated 4 months ago
- Software Development Kit (SDK) for the Intel® Geti™ platform for Computer Vision AI model training.☆75Updated this week
- Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes☆373Updated 4 months ago
- ML model optimization product to accelerate inference.☆321Updated 8 months ago
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆440Updated last week
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆339Updated this week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆176Updated last week
- ☆201Updated this week
- GenAI components at micro-service level; GenAI service composer to create mega-service☆84Updated this week
- A curated list of OpenVINO based AI projects☆113Updated 2 weeks ago
- The Triton backend for the ONNX Runtime.☆135Updated this week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆790Updated this week
- FSDL 2021 course project - Active Learning in NLP☆5Updated 2 weeks ago
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆388Updated this week
- Fast sparse deep learning on CPUs☆51Updated 2 years ago
- A collection of all available inference solutions for the LLMs☆74Updated 3 months ago
- Examples for using ONNX Runtime for model training.☆317Updated last month
- A pytorch quantization backend for optimum☆847Updated 3 weeks ago