gpu-mode / popcornLinks

☆19

Alternatives and similar repositories for popcorn

Users that are interested in popcorn are comparing it to the libraries listed below

Sorting:

meta-pytorch / BackendBench
How to ensure correctness and ship LLM generated kernels in PyTorch
☆121Updated last week
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆305Updated 3 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆250Updated this week
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆61Updated last week
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆216Updated last week
gpu-mode / triton-index
Cataloging released Triton kernels.
☆267Updated 2 months ago
cchan / tccl
extensible collectives library in triton
☆91Updated 7 months ago
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆398Updated this week
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆286Updated this week
octoml / octoml-profile
Home for OctoML PyTorch Profiler
☆114Updated 2 years ago
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆150Updated 2 years ago
gpu-mode / ring-attention
ring-attention experiments
☆155Updated last year
meta-pytorch / kraken
Triton-based Symmetric Memory operators and examples
☆63Updated last month
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆167Updated 7 months ago
vllm-project / tpu-inference
TPU inference for vLLM, with unified JAX and PyTorch support.
☆163Updated this week
triton-lang / kernels
☆93Updated last year
yifuwang / symm-mem-recipes
☆148Updated 10 months ago
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆225Updated last year
gpu-mode / reference-kernels
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
☆158Updated last week
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆244Updated 6 months ago
meta-pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆161Updated 2 months ago
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆272Updated 2 weeks ago
NVIDIA / compute-eval
Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…
☆74Updated last month
gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆111Updated last week
stanford-futuredata / stk
☆113Updated last year
perplexityai / pplx-garden
Perplexity open source garden for inference technology
☆232Updated this week
meta-pytorch / torchcomms
torchcomms: a modern PyTorch communications API
☆291Updated this week
apple / ml-recurrent-drafter
☆218Updated 10 months ago
NVIDIA / nvshmem
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆385Updated last week
pytorch / rfcs
PyTorch RFCs (experimental)
☆136Updated 5 months ago