huggingface / optimum-executorchLinks
π€ Optimum ExecuTorch
β93Updated last week
Alternatives and similar repositories for optimum-executorch
Users that are interested in optimum-executorch are comparing it to the libraries listed below
Sorting:
- AI Edge Quantizer: flexible post training quantization for LiteRT models.β84Updated last week
- Use safetensors with ONNX π€β78Updated 2 months ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ225Updated last week
- Scalable and Performant Data Loadingβ356Updated this week
- β219Updated 11 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated 3 weeks ago
- Load compute kernels from the Hubβ352Updated last week
- Python bindings for ggmlβ146Updated last year
- Google TPU optimizations for transformers modelsβ131Updated last week
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"β155Updated last year
- A tool to configure, launch and manage your machine learning experiments.β212Updated this week
- Fast low-bit matmul kernels in Tritonβ413Updated last week
- π· Build compute kernelsβ195Updated last week
- MLX support for the Open Neural Network Exchange (ONNX)β63Updated last year
- A minimalistic C++ Jinja templating engine for LLM chat templatesβ202Updated 3 months ago
- Explore training for quantized modelsβ25Updated 5 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ195Updated 6 months ago
- Visualize ONNX models with model-explorerβ66Updated 2 weeks ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggmlβ304Updated last year
- Model compression for ONNXβ99Updated last year
- Advanced quantization toolkit for LLMs and VLMs. Support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Traβ¦β785Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ227Updated last year
- π€ Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtimeβ105Updated last week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β327Updated last month
- Supporting PyTorch models with the Google AI Edge TFLite runtime.β880Updated last week
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.β320Updated last week
- Simple & Scalable Pretraining for Neural Architecture Researchβ305Updated 3 weeks ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"β79Updated last week
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β325Updated 3 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)β903Updated last week