huggingface / optimum-executorchLinks
🤗 Optimum ExecuTorch
☆74Updated this week
Alternatives and similar repositories for optimum-executorch
Users that are interested in optimum-executorch are comparing it to the libraries listed below
Sorting:
- Use safetensors with ONNX 🤗☆73Updated 3 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated last year
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆73Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆183Updated this week
- Python bindings for ggml☆146Updated last year
- Google TPU optimizations for transformers models☆121Updated 9 months ago
- A tool to configure, launch and manage your machine learning experiments.☆198Updated last week
- Thin wrapper around GGML to make life easier☆40Updated 4 months ago
- Fast low-bit matmul kernels in Triton☆385Updated last week
- FlashRNN - Fast RNN Kernels with I/O Awareness☆103Updated last week
- Load compute kernels from the Hub☆308Updated this week
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.☆679Updated this week
- ☆218Updated 9 months ago
- 👷 Build compute kernels☆163Updated this week
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated last year
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆199Updated last week
- This repository contains the experimental PyTorch native float8 training UX☆223Updated last year
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆194Updated 4 months ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆295Updated last year
- Scalable and Performant Data Loading☆330Updated this week
- A minimalistic C++ Jinja templating engine for LLM chat templates☆193Updated last month
- Profile your CoreML models directly from Python 🐍☆29Updated last month
- Write a fast kernel and run it on Discord. See how you compare against the best!☆58Updated 2 weeks ago
- Experiments with BitNet inference on CPU☆54Updated last year
- A simple, hackable text-to-speech system in PyTorch and MLX☆176Updated 2 months ago
- Visualize ONNX models with model-explorer☆62Updated 2 weeks ago
- Explore training for quantized models☆25Updated 3 months ago
- Model compression for ONNX☆97Updated 11 months ago
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆811Updated this week
- Open-source reproducible benchmarks from Argmax☆65Updated last week