gpu-mode / discord-cluster-managerLinks
Write a fast kernel and run it on Discord. See how you compare against the best!
☆46Updated 2 weeks ago
Alternatives and similar repositories for discord-cluster-manager
Users that are interested in discord-cluster-manager are comparing it to the libraries listed below
Sorting:
- extensible collectives library in triton☆87Updated 3 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆134Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 3 months ago
- A bunch of kernels that might make stuff slower 😉☆54Updated this week
- Experiment of using Tangent to autodiff triton☆79Updated last year
- ☆28Updated 5 months ago
- Collection of kernels written in Triton language☆136Updated 3 months ago
- ring-attention experiments☆144Updated 8 months ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆62Updated last week
- High-Performance SGEMM on CUDA devices☆97Updated 5 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆64Updated 3 months ago
- TritonParse is a tool designed to help developers analyze and debug Triton kernels by visualizing the compilation process and source code…☆126Updated this week
- Learn CUDA with PyTorch☆29Updated this week
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆96Updated last month
- ☆21Updated 4 months ago
- Personal solutions to the Triton Puzzles☆19Updated 11 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆188Updated last month
- Fast low-bit matmul kernels in Triton☆327Updated this week
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.☆128Updated 3 months ago
- Samples of good AI generated CUDA kernels☆84Updated last month
- PyTorch Single Controller☆296Updated this week
- ☆225Updated this week
- train with kittens!☆61Updated 8 months ago
- Load compute kernels from the Hub☆203Updated this week
- Learning about CUDA by writing PTX code.☆133Updated last year
- Cataloging released Triton kernels.☆242Updated 6 months ago
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆51Updated this week
- ☆33Updated 2 weeks ago
- A Quirky Assortment of CuTe Kernels☆126Updated last week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆195Updated 2 months ago