huggingface / optimum-executorchLinks
🤗 Optimum ExecuTorch
☆88Updated this week
Alternatives and similar repositories for optimum-executorch
Users that are interested in optimum-executorch are comparing it to the libraries listed below
Sorting:
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆81Updated 3 weeks ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆214Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last year
- Load compute kernels from the Hub☆348Updated this week
- ☆219Updated 10 months ago
- Fast low-bit matmul kernels in Triton☆402Updated 2 weeks ago
- Scalable and Performant Data Loading☆349Updated this week
- A tool to configure, launch and manage your machine learning experiments.☆209Updated this week
- Google TPU optimizations for transformers models☆124Updated 10 months ago
- 👷 Build compute kernels☆192Updated this week
- Advanced quantization toolkit for LLMs and VLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration wi…☆753Updated this week
- Official implementation for Training LLMs with MXFP4☆111Updated 7 months ago
- Explore training for quantized models☆25Updated 4 months ago
- Use safetensors with ONNX 🤗☆76Updated 2 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆196Updated 6 months ago
- MLX support for the Open Neural Network Exchange (ONNX)☆62Updated last year
- An innovative library for efficient LLM inference via low-bit quantization☆350Updated last year
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆855Updated this week
- This repository contains the experimental PyTorch native float8 training UX☆226Updated last year
- 🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime☆95Updated last week
- A minimalistic C++ Jinja templating engine for LLM chat templates☆200Updated 2 months ago
- Easy and Efficient Quantization for Transformers☆203Updated 5 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆378Updated 7 months ago
- Simple high-throughput inference library☆150Updated 6 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆78Updated 2 months ago
- Open-source reproducible benchmarks from Argmax☆70Updated this week
- A simple, hackable text-to-speech system in PyTorch and MLX☆183Updated 4 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 8 months ago
- Visualize ONNX models with model-explorer☆64Updated last month
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆412Updated this week