gpu-mode / discord-cluster-managerLinks

Write a fast kernel and run it on Discord. See how you compare against the best!

☆61

Alternatives and similar repositories for discord-cluster-manager

Users that are interested in discord-cluster-manager are comparing it to the libraries listed below

Sorting:

meta-pytorch / BackendBench
Ship correct and fast LLM kernels to PyTorch
☆124Updated 2 weeks ago
gpu-mode / ring-attention
ring-attention experiments
☆160Updated last year
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆151Updated 2 years ago
open-lm-engine / accelerated-model-architectures
A bunch of kernels that might make stuff slower 😉
☆65Updated this week
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
cchan / tccl
extensible collectives library in triton
☆91Updated 8 months ago
NVIDIA / nsight-python
Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools
☆61Updated last week
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆401Updated last week
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆220Updated this week
PrimeIntellect-ai / pccl
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆138Updated 2 months ago
HazyResearch / train-tk
train with kittens!
☆63Updated last year
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆112Updated 10 months ago
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆196Updated 6 months ago
gpu-mode / popcorn-cli
☆75Updated 3 weeks ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆169Updated 7 months ago
AI-Hypercomputer / jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
☆78Updated 2 months ago
facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆111Updated last year
Jokeren / triton-samples
☆28Updated 10 months ago
gpu-mode / reference-kernels
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
☆164Updated this week
alexzhang13 / Triton-Puzzles-Solutions
Personal solutions to the Triton Puzzles
☆20Updated last year
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 8 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆256Updated last week
SzymonOzog / Penny
Hand-Rolled GPU communications library
☆72Updated last week
gpu-mode / triton-index
Cataloging released Triton kernels.
☆274Updated 2 months ago
huggingface / kernel-builder
👷 Build compute kernels
☆190Updated this week
daniel-geon-park / triton_bwd
Automatic differentiation for Triton Kernels
☆30Updated 3 months ago
lessw2020 / triton_kernels_for_fun_and_profit
Custom kernels in Triton language for accelerating LLMs
☆27Updated last year
NVIDIA / compute-eval
Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…
☆76Updated last week
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆226Updated last year
lianakoleva / no-libtorch-compile
☆21Updated 9 months ago