huggingface / optimum-executorchLinks
🤗 Optimum ExecuTorch
☆63Updated this week
Alternatives and similar repositories for optimum-executorch
Users that are interested in optimum-executorch are comparing it to the libraries listed below
Sorting:
- Fast low-bit matmul kernels in Triton☆365Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆161Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated 11 months ago
- Load compute kernels from the Hub☆283Updated this week
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.☆631Updated this week
- ☆217Updated 7 months ago
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆785Updated this week
- This repository contains the experimental PyTorch native float8 training UX☆224Updated last year
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆194Updated this week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆65Updated this week
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆294Updated last year
- Use safetensors with ONNX 🤗☆69Updated 2 weeks ago
- Explore training for quantized models☆24Updated 2 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆655Updated 4 months ago
- Python bindings for ggml☆146Updated last year
- ☆330Updated last week
- Applied AI experiments and examples for PyTorch☆294Updated 3 weeks ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆878Updated last week
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated last year
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆381Updated last week
- Scalable and Performant Data Loading☆302Updated this week
- High-Performance SGEMM on CUDA devices☆101Updated 7 months ago
- LiteRT continues the legacy of TensorFlow Lite as the trusted, high-performance runtime for on-device AI. Now with LiteRT Next, we're exp…☆801Updated this week
- Cataloging released Triton kernels.☆257Updated last week
- A tool to configure, launch and manage your machine learning experiments.☆190Updated this week
- ☆234Updated last week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆315Updated last week
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆72Updated this week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆401Updated 3 weeks ago
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆673Updated last month