pytorch / executorch
On-device AI across mobile, embedded and edge for PyTorch
☆2,829Updated this week
Alternatives and similar repositories for executorch
Users that are interested in executorch are comparing it to the libraries listed below
Sorting:
- PyTorch native quantization and sparsity for training and inference☆2,030Updated this week
- A PyTorch native library for large-scale model training☆3,675Updated this week
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,580Updated last week
- PyTorch native post-training library☆5,171Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…☆2,400Updated this week
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,172Updated 7 months ago
- ☆961Updated 3 months ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆2,991Updated this week
- Tile primitives for speedy kernels☆2,339Updated this week
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,295Updated 3 weeks ago
- Thunder gives you PyTorch models superpowers for training and inference. Unlock out-of-the-box optimizations for performance, memory and …☆1,340Updated this week
- TinyChatEngine: On-Device LLM Inference Library☆846Updated 10 months ago
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆569Updated this week
- FlashInfer: Kernel Library for LLM Serving☆2,815Updated this week
- TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizati…☆10,476Updated this week
- A modern model graph visualizer and debugger☆1,177Updated this week
- 🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization…☆2,892Updated this week
- nvidia-modelopt is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculat…☆909Updated this week
- A Datacenter Scale Distributed Inference Serving Framework☆3,973Updated this week
- A pytorch quantization backend for optimum☆928Updated 2 weeks ago
- Training LLMs with QLoRA + FSDP☆1,476Updated 6 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,155Updated this week
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆3,202Updated this week
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,147Updated this week
- A simple, performant and scalable Jax LLM!☆1,717Updated this week
- Puzzles for learning Triton☆1,614Updated 5 months ago
- Modeling, training, eval, and inference code for OLMo☆5,581Updated last week
- SGLang is a fast serving framework for large language models and vision language models.☆14,188Updated this week
- Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…☆1,254Updated last week
- Development repository for the Triton language and compiler☆15,504Updated this week