huggingface / optimum-executorchLinks
🤗 Optimum ExecuTorch
☆46Updated last week
Alternatives and similar repositories for optimum-executorch
Users that are interested in optimum-executorch are comparing it to the libraries listed below
Sorting:
- A high-throughput and memory-efficient inference and serving engine for LLMs☆263Updated 7 months ago
- Google TPU optimizations for transformers models☆112Updated 4 months ago
- Fast low-bit matmul kernels in Triton☆311Updated this week
- Python bindings for ggml☆141Updated 9 months ago
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆47Updated last week
- Load compute kernels from the Hub☆139Updated last week
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆153Updated 7 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆184Updated last week
- ModernBERT model optimized for Apple Neural Engine.☆26Updated 4 months ago
- Explore training for quantized models☆18Updated this week
- making the official triton tutorials actually comprehensible☆34Updated 2 months ago
- ☆88Updated last year
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆69Updated 2 weeks ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆60Updated 2 months ago
- Simple high-throughput inference library☆115Updated 3 weeks ago
- ☆130Updated 2 months ago
- ☆210Updated 4 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆199Updated 10 months ago
- Thin wrapper around GGML to make life easier☆34Updated this week
- Open-source and reproducible benchmarks for Speaker Diarization☆26Updated last month
- ☆119Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on disk☆117Updated this week
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆136Updated 2 weeks ago
- ☆49Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆223Updated 10 months ago
- Applied AI experiments and examples for PyTorch☆271Updated last week
- Experiments with BitNet inference on CPU☆55Updated last year
- Model compression for ONNX☆96Updated 6 months ago
- KV cache compression for high-throughput LLM inference☆129Updated 4 months ago
- ☆22Updated last year