pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
β211Updated this week
Alternatives and similar repositories for applied-ai:
Users that are interested in applied-ai are comparing it to the libraries listed below
- Fast low-bit matmul kernels in Tritonβ187Updated last week
- β170Updated this week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β182Updated this week
- Cataloging released Triton kernels.β155Updated last week
- Collection of kernels written in Triton languageβ90Updated 2 months ago
- This repository contains the experimental PyTorch native float8 training UXβ219Updated 5 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMsβ219Updated last week
- β178Updated 6 months ago
- extensible collectives library in tritonβ76Updated 3 months ago
- β96Updated 4 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.β93Updated 6 months ago
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).β230Updated 2 months ago
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformersβ204Updated 4 months ago
- QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Servingβ481Updated 2 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttentionβ272Updated last month
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ290Updated 6 months ago
- Zero Bubble Pipeline Parallelismβ309Updated 2 months ago
- Triton-based implementation of Sparse Mixture of Experts.β192Updated last month
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β215Updated this week
- β64Updated 2 months ago
- Fastest kernels written from scratchβ118Updated last month
- A fast communication-overlapping library for tensor parallelism on GPUs.β271Updated 2 months ago
- A collection of memory efficient attention operators implemented in the Triton language.β229Updated 7 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.β75Updated this week
- β157Updated last year
- Fast Hadamard transform in CUDA, with a PyTorch interfaceβ132Updated 7 months ago
- LLM KV cache compression made easyβ303Updated this week
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.β315Updated last month
- Boosting 4-bit inference kernels with 2:4 Sparsityβ64Updated 4 months ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantizationβ325Updated 5 months ago