huggingface / optimum-executorchLinks
π€ Optimum ExecuTorch
β101Updated last week
Alternatives and similar repositories for optimum-executorch
Users that are interested in optimum-executorch are comparing it to the libraries listed below
Sorting:
- AI Edge Quantizer: flexible post training quantization for LiteRT models.β88Updated last week
- Use safetensors with ONNX π€β81Updated last week
- Supporting PyTorch models with the Google AI Edge TFLite runtime.β903Updated this week
- Python bindings for ggmlβ146Updated last year
- A minimalistic C++ Jinja templating engine for LLM chat templatesβ200Updated 3 months ago
- A tool to configure, launch and manage your machine learning experiments.β214Updated this week
- Google TPU optimizations for transformers modelsβ132Updated last month
- LiteRT, successor to TensorFlow Lite. is Google's On-device framework for high-performance ML & GenAI deployment on edge platforms, via eβ¦β1,289Updated this week
- π€ Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtimeβ112Updated 3 weeks ago
- Visualize ONNX models with model-explorerβ66Updated last week
- Scalable and Performant Data Loadingβ362Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on diskβ233Updated last week
- β218Updated 11 months ago
- Load compute kernels from the Hubβ376Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated last month
- Inference Vision Transformer (ViT) in plain C/C++ with ggmlβ305Updated last year
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.β418Updated this week
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β204Updated last week
- π· Build compute kernelsβ213Updated this week
- AMD related optimizations for transformer modelsβ96Updated 3 months ago
- Open-source reproducible benchmarks from Argmaxβ77Updated this week
- Easy and Efficient Quantization for Transformersβ202Updated 6 months ago
- Fast low-bit matmul kernels in Tritonβ423Updated last month
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"β155Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)β905Updated last month
- β91Updated last year
- Thin wrapper around GGML to make life easierβ42Updated 2 months ago
- π―An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantizaβ¦β815Updated this week
- Inference of Mamba models in pure Cβ195Updated last year
- MLX support for the Open Neural Network Exchange (ONNX)β63Updated last year