huggingface / optimum-executorchLinks
π€ Optimum ExecuTorch
β80Updated last week
Alternatives and similar repositories for optimum-executorch
Users that are interested in optimum-executorch are comparing it to the libraries listed below
Sorting:
- AI Edge Quantizer: flexible post training quantization for LiteRT models.β76Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on diskβ204Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated last year
- Fast low-bit matmul kernels in Tritonβ395Updated 3 weeks ago
- Load compute kernels from the Hubβ327Updated last week
- Scalable and Performant Data Loadingβ335Updated this week
- Advanced quantization toolkit for LLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with Transformβ¦β712Updated this week
- A tool to configure, launch and manage your machine learning experiments.β205Updated this week
- Use safetensors with ONNX π€β73Updated last month
- Inference Vision Transformer (ViT) in plain C/C++ with ggmlβ298Updated last year
- β218Updated 9 months ago
- π· Build compute kernelsβ178Updated this week
- A simple, hackable text-to-speech system in PyTorch and MLXβ184Updated 3 months ago
- Python bindings for ggmlβ146Updated last year
- Thin wrapper around GGML to make life easierβ40Updated 2 weeks ago
- Official implementation of Half-Quadratic Quantization (HQQ)β891Updated 3 weeks ago
- Model compression for ONNXβ98Updated last year
- Google TPU optimizations for transformers modelsβ122Updated 9 months ago
- Open-source reproducible benchmarks from Argmaxβ68Updated this week
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"β154Updated last year
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ196Updated 5 months ago
- Simple high-throughput inference libraryβ149Updated 6 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)β66Updated 7 months ago
- An innovative library for efficient LLM inference via low-bit quantizationβ349Updated last year
- FlashRNN - Fast RNN Kernels with I/O Awarenessβ163Updated 3 weeks ago
- [EMNLP Main '25] LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximationβ135Updated 6 months ago
- Experiments with BitNet inference on CPUβ54Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.β202Updated last year
- Supporting PyTorch models with the Google AI Edge TFLite runtime.β832Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ223Updated last year