huggingface / optimum-executorchLinks
🤗 Optimum ExecuTorch
☆53Updated this week
Alternatives and similar repositories for optimum-executorch
Users that are interested in optimum-executorch are comparing it to the libraries listed below
Sorting:
- A safetensors extension to efficiently store sparse quantized tensors on disk☆129Updated this week
- Load compute kernels from the Hub☆191Updated this week
- Fast low-bit matmul kernels in Triton☆322Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆264Updated 8 months ago
- Explore training for quantized models☆18Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆172Updated 2 months ago
- A tool to configure, launch and manage your machine learning experiments.☆161Updated this week
- Google TPU optimizations for transformers models☆113Updated 5 months ago
- ☆68Updated this week
- This repository contains the experimental PyTorch native float8 training UX☆224Updated 10 months ago
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆49Updated this week
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆288Updated 4 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 8 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆80Updated 3 weeks ago
- ☆213Updated 5 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆189Updated last month
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 3 months ago
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆39Updated last month
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆109Updated 8 months ago
- Model compression for ONNX☆96Updated 7 months ago
- KV cache compression for high-throughput LLM inference☆131Updated 4 months ago
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆47Updated this week
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆62Updated 2 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆66Updated 2 months ago
- Applied AI experiments and examples for PyTorch☆277Updated 3 weeks ago
- Use safetensors with ONNX 🤗☆63Updated 3 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆185Updated 3 weeks ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆198Updated 11 months ago
- ☆137Updated this week
- Samples of good AI generated CUDA kernels☆83Updated 3 weeks ago