huggingface / optimum-executorchLinks
π€ Optimum ExecuTorch
β64Updated last week
Alternatives and similar repositories for optimum-executorch
Users that are interested in optimum-executorch are comparing it to the libraries listed below
Sorting:
- A high-throughput and memory-efficient inference and serving engine for LLMsβ266Updated 10 months ago
- Load compute kernels from the Hubβ244Updated last week
- A safetensors extension to efficiently store sparse quantized tensors on diskβ153Updated this week
- Python bindings for ggmlβ146Updated 11 months ago
- A tool to configure, launch and manage your machine learning experiments.β183Updated this week
- Fast low-bit matmul kernels in Tritonβ353Updated last week
- β217Updated 7 months ago
- AI Edge Quantizer: flexible post training quantization for LiteRT models.β60Updated this week
- Official implementation of Half-Quadratic Quantization (HQQ)β868Updated last week
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Traβ¦β596Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- Explore training for quantized modelsβ22Updated last month
- Scalable and Performant Data Loadingβ291Updated this week
- On-device intelligence.β371Updated 5 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β571Updated 2 weeks ago
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β261Updated last month
- VPTQ, A Flexible and Extreme low-bit quantization algorithmβ652Updated 4 months ago
- A minimalistic C++ Jinja templating engine for LLM chat templatesβ170Updated 2 weeks ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggmlβ292Updated last year
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.β376Updated this week
- Google TPU optimizations for transformers modelsβ118Updated 7 months ago
- Use safetensors with ONNX π€β69Updated last month
- An innovative library for efficient LLM inference via low-bit quantizationβ349Updated 11 months ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β389Updated 2 weeks ago
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β193Updated this week
- β325Updated 3 weeks ago
- AMD related optimizations for transformer modelsβ82Updated this week
- Applied AI experiments and examples for PyTorchβ291Updated this week
- LLM KV cache compression made easyβ586Updated this week
- β232Updated last week