huggingface / optimum-executorchLinks
π€ Optimum ExecuTorch
β58Updated this week
Alternatives and similar repositories for optimum-executorch
Users that are interested in optimum-executorch are comparing it to the libraries listed below
Sorting:
- A safetensors extension to efficiently store sparse quantized tensors on diskβ142Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ266Updated 9 months ago
- β215Updated 6 months ago
- Fast low-bit matmul kernels in Tritonβ338Updated last week
- Python bindings for ggmlβ142Updated 11 months ago
- Load compute kernels from the Hubβ220Updated this week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.β56Updated last week
- A tool to configure, launch and manage your machine learning experiments.β174Updated last week
- An innovative library for efficient LLM inference via low-bit quantizationβ349Updated 11 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ205Updated 3 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMsβ372Updated 3 months ago
- β227Updated last week
- Google TPU optimizations for transformers modelsβ117Updated 6 months ago
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- ArcticInference: vLLM plugin for high-throughput, low-latency inferenceβ203Updated this week
- VPTQ, A Flexible and Extreme low-bit quantization algorithmβ649Updated 3 months ago
- Applied AI experiments and examples for PyTorchβ289Updated 2 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)β855Updated last week
- Explore training for quantized modelsβ20Updated 3 weeks ago
- LLM training in simple, raw C/CUDAβ102Updated last year
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Traβ¦β564Updated this week
- β162Updated last year
- Use safetensors with ONNX π€β69Updated last month
- Efficient LLM Inference over Long Sequencesβ385Updated last month
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ188Updated 2 months ago
- A minimalistic C++ Jinja templating engine for LLM chat templatesβ163Updated 3 weeks ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"β154Updated 9 months ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggmlβ289Updated last year
- High-Performance SGEMM on CUDA devicesβ98Updated 6 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β206Updated last week