huggingface / optimum-executorchLinks
🤗 Optimum ExecuTorch
☆108Updated this week
Alternatives and similar repositories for optimum-executorch
Users that are interested in optimum-executorch are comparing it to the libraries listed below
Sorting:
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆96Updated this week
- Use safetensors with ONNX 🤗☆84Updated 3 weeks ago
- Support PyTorch model conversion with LiteRT.☆930Updated this week
- Google TPU optimizations for transformers models☆134Updated 2 weeks ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆238Updated this week
- A tool to configure, launch and manage your machine learning experiments.☆216Updated this week
- Load compute kernels from the Hub☆397Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated 2 months ago
- Python bindings for ggml☆147Updated last year
- A minimalistic C++ Jinja templating engine for LLM chat templates☆203Updated 4 months ago
- Visualize ONNX models with model-explorer☆67Updated 3 weeks ago
- ☆219Updated last year
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆79Updated last month
- Scalable and Performant Data Loading☆364Updated this week
- Fast low-bit matmul kernels in Triton☆427Updated last week
- 🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime☆114Updated last week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆198Updated 8 months ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆306Updated last year
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆130Updated last week
- Open-source reproducible benchmarks from Argmax☆77Updated 2 weeks ago
- 👷 Build compute kernels☆215Updated last week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆420Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆205Updated this week
- A simple, hackable text-to-speech system in PyTorch and MLX☆187Updated 6 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆68Updated last week
- MLX support for the Open Neural Network Exchange (ONNX)☆63Updated last year
- Where GPUs get cooked 👩🍳🔥☆363Updated 2 weeks ago
- Explore training for quantized models☆26Updated 6 months ago
- Notes and artifacts from the ONNX steering committee☆28Updated last week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆334Updated 3 months ago