pytorch-labs / applied-aiLinks

Applied AI experiments and examples for PyTorch

☆289

Alternatives and similar repositories for applied-ai

Users that are interested in applied-ai are comparing it to the libraries listed below

Sorting:

gpu-mode / triton-index
Cataloging released Triton kernels.
☆246Updated 6 months ago
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆338Updated this week
Deep-Learning-Profiling-Tools / triton-viz
☆227Updated this week
Dao-AILab / quack
A Quirky Assortment of CuTe Kernels
☆374Updated this week
pytorch-labs / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆193Updated this week
ColfaxResearch / cutlass-kernels
☆227Updated last year
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆207Updated last week
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆139Updated 3 months ago
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆260Updated 2 weeks ago
cchan / tccl
extensible collectives library in triton
☆88Updated 4 months ago
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆106Updated 2 months ago
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆203Updated 2 months ago
FlagOpen / FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
☆273Updated last year
triton-lang / kernels
☆85Updated 8 months ago
HazyResearch / Megakernels
kernels, of the mega variety
☆461Updated 2 months ago
Dao-AILab / fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
☆210Updated last year
yifuwang / symm-mem-recipes
☆101Updated 7 months ago
pranjalssh / fast.cu
Fastest kernels written from scratch
☆308Updated 3 months ago
pytorch-labs / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆200Updated this week
efeslab / Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆318Updated last year
gpu-mode / ring-attention
ring-attention experiments
☆145Updated 9 months ago
stanford-futuredata / stk
☆107Updated 11 months ago
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆227Updated 8 months ago
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆258Updated last week
ppl-ai / pplx-kernels
Perplexity GPU Kernels
☆413Updated 2 weeks ago
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆411Updated 2 months ago
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆134Updated 2 weeks ago
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆405Updated 2 months ago
SqueezeBits / QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆118Updated last year