gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆34Updated this week
Alternatives and similar repositories for discord-cluster-manager:
Users that are interested in discord-cluster-manager are comparing it to the libraries listed below
- extensible collectives library in triton☆84Updated 6 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- Fast low-bit matmul kernels in Triton☆267Updated this week
- Cataloging released Triton kernels.☆204Updated 2 months ago
- Collection of kernels written in Triton language☆114Updated last month
- Custom kernels in Triton language for accelerating LLMs☆18Updated 11 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆126Updated last year
- ☆27Updated 2 months ago
- High-Performance SGEMM on CUDA devices☆86Updated 2 months ago
- ☆73Updated 4 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆234Updated this week
- ☆191Updated this week
- Make triton easier☆47Updated 9 months ago
- ring-attention experiments☆127Updated 5 months ago
- ☆21Updated 2 weeks ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆54Updated last month
- Explore training for quantized models☆17Updated 2 months ago
- ☆62Updated 3 weeks ago
- Applied AI experiments and examples for PyTorch☆249Updated this week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆104Updated this week
- Learn CUDA with PyTorch☆19Updated last month
- ☆151Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 7 months ago
- seqax = sequence modeling + JAX☆150Updated this week