NVlabs / vibetensorLinks

Our first fully AI generated deep learning system

☆481

Alternatives and similar repositories for vibetensor

Users that are interested in vibetensor are comparing it to the libraries listed below

Sorting:

meta-pytorch / BackendBench
Ship correct and fast LLM kernels to PyTorch
☆140Updated 3 weeks ago
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆324Updated this week
gpu-mode / ring-attention
ring-attention experiments
☆165Updated last year
meta-pytorch / KernelAgent
Autonomous GPU Kernel Generation via Deep Agents
☆228Updated this week
NVIDIA / TileGym
Helpful kernel tutorials and examples for tile-based GPU programming
☆630Updated this week
microsoft / AttentionEngine
☆118Updated 8 months ago
Dao-AILab / sonic-moe
Accelerating MoE with IO and Tile-aware Optimizations
☆569Updated 3 weeks ago
meta-pytorch / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆194Updated this week
NVIDIA / jaxpp
JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training
☆64Updated 2 weeks ago
HazyResearch / Megakernels
kernels, of the mega variety
☆665Updated last week
Deep-Learning-Profiling-Tools / triton-viz
☆286Updated last week
deepreinforce-ai / CUDA-L1
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
☆287Updated 3 months ago
meta-pytorch / torchforge
PyTorch-native post-training at scale
☆613Updated this week
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆178Updated 2 weeks ago
AndreSlavescu / mHC.cu
mHC kernels implemented in CUDA
☆249Updated 3 weeks ago
z-lab / dflash
Block Diffusion for Ultra-Fast Speculative Decoding
☆459Updated this week
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆427Updated last week
deepseek-ai / LPLB
An early research stage expert-parallel load balancer for MoE models based on linear programming.
☆495Updated 2 months ago
triton-lang / kernels
☆104Updated last year
sgl-project / sglang-jax
JAX backend for SGL
☆234Updated this week
cchan / tccl
extensible collectives library in triton
☆95Updated 10 months ago
pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆739Updated this week
vllm-project / tpu-inference
TPU inference for vLLM, with unified JAX and PyTorch support.
☆228Updated this week
NVIDIA / tilus
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
☆441Updated this week
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆315Updated 5 months ago
gpu-mode / reference-kernels
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
☆201Updated this week
open-lm-engine / accelerated-model-architectures
A bunch of kernels that might make stuff slower 😉
☆75Updated this week
gpu-mode / triton-index
Cataloging released Triton kernels.
☆292Updated 5 months ago
Dao-AILab / grouped-latent-attention
☆131Updated 8 months ago
meta-pytorch / torchcomms
torchcomms: a modern PyTorch communications API
☆327Updated this week