pytorch / executorch
On-device AI across mobile, embedded and edge for PyTorch
☆1,698Updated this week
Related projects: ⓘ
- A native PyTorch Library for large model training☆1,544Updated this week
- Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…☆1,131Updated this week
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,106Updated 3 weeks ago
- Training LLMs with QLoRA + FSDP☆1,382Updated last week
- A simple, performant and scalable Jax LLM!☆1,450Updated this week
- ☆870Updated this week
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,100Updated this week
- Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…☆1,116Updated last week
- A Native-PyTorch Library for LLM Fine-tuning☆3,942Updated this week
- Reaching LLaMA2 Performance with 0.1M Dollars☆955Updated last month
- An Extensible Deep Learning Library☆1,784Updated this week
- VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and…☆1,786Updated last week
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆929Updated 2 weeks ago
- PyTorch native quantization and sparsity for training and inference☆726Updated this week
- TinyChatEngine: On-Device LLM Inference Library☆699Updated 2 months ago
- Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.☆2,055Updated this week
- SGLang is a fast serving framework for large language models and vision language models.☆5,121Updated this week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆2,333Updated 2 months ago
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆5,506Updated this week
- Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch☆1,529Updated last week
- Tile primitives for speedy kernels☆1,489Updated this week
- Blazingly fast LLM inference.☆3,406Updated this week
- A modern model graph visualizer and debugger☆976Updated this week
- Run Mixtral-8x7B models in Colab or consumer desktops☆2,288Updated 5 months ago
- llama3.np is a pure NumPy implementation for Llama 3 model.☆955Updated 3 months ago
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆2,577Updated this week
- Tools for merging pretrained large language models.☆4,501Updated this week
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection☆1,354Updated last week
- Inference Llama 2 in one file of pure 🔥☆2,091Updated 3 months ago
- A pytorch quantization backend for optimum☆758Updated this week