huggingface / optimum-executorchLinks

🤗 Optimum ExecuTorch

☆74

Alternatives and similar repositories for optimum-executorch

Users that are interested in optimum-executorch are comparing it to the libraries listed below

Sorting:

justinchuby / onnx-safetensors
Use safetensors with ONNX 🤗
☆73Updated 3 weeks ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated last year
google-ai-edge / ai-edge-quantizer
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆73Updated this week
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆183Updated this week
abetlen / ggml-python
Python bindings for ggml
☆146Updated last year
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆121Updated 9 months ago
NVIDIA-NeMo / Run
A tool to configure, launch and manage your machine learning experiments.
☆198Updated last week
ngxson / ggml-easy
Thin wrapper around GGML to make life easier
☆40Updated 4 months ago
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆385Updated last week
NX-AI / flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
☆103Updated last week
huggingface / kernels
Load compute kernels from the Hub
☆308Updated this week
intel / auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.
☆679Updated this week
apple / ml-recurrent-drafter
☆218Updated 9 months ago
huggingface / kernel-builder
👷 Build compute kernels
☆163Updated this week
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated last year
huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆199Updated last week
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆223Updated last year
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆194Updated 4 months ago
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆295Updated last year
facebookresearch / spdl
Scalable and Performant Data Loading
☆330Updated this week
google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆193Updated last month
FL33TW00D / coremlprofiler
Profile your CoreML models directly from Python 🐍
☆29Updated last month
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆58Updated 2 weeks ago
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
lucasnewman / nanospeech
A simple, hackable text-to-speech system in PyTorch and MLX
☆176Updated 2 months ago
justinchuby / model-explorer-onnx
Visualize ONNX models with model-explorer
☆62Updated 2 weeks ago
gau-nernst / quantized-training
Explore training for quantized models
☆25Updated 3 months ago
onnx / neural-compressor
Model compression for ONNX
☆97Updated 11 months ago
google-ai-edge / ai-edge-torch
Supporting PyTorch models with the Google AI Edge TFLite runtime.
☆811Updated this week
argmaxinc / OpenBench
Open-source reproducible benchmarks from Argmax
☆65Updated last week