meta-pytorch / BackendBenchLinks
How to ensure correctness and ship LLM generated kernels in PyTorch
☆114Updated 2 weeks ago
Alternatives and similar repositories for BackendBench
Users that are interested in BackendBench are comparing it to the libraries listed below
Sorting:
- extensible collectives library in triton☆90Updated 7 months ago
- Triton-based Symmetric Memory operators and examples☆61Updated 3 weeks ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆61Updated this week
- ring-attention experiments☆155Updated last year
- TPU inference for vLLM, with unified JAX and PyTorch support.☆155Updated this week
- ☆93Updated last year
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆55Updated last month
- Framework to reduce autotune overhead to zero for well known deployments.☆85Updated last month
- A bunch of kernels that might make stuff slower 😉☆64Updated this week
- ☆13Updated 2 weeks ago
- Collection of kernels written in Triton language☆161Updated 7 months ago
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆104Updated this week
- ☆246Updated this week
- Autonomous GPU Kernel Generation via Deep Agents☆82Updated this week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆277Updated last week
- JAX backend for SGL☆146Updated this week
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆128Updated last week
- DeeperGEMM: crazy optimized version☆73Updated 6 months ago
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆377Updated 3 weeks ago
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆75Updated last month
- Applied AI experiments and examples for PyTorch☆302Updated 2 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆85Updated last year
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆169Updated last week
- ☆65Updated 6 months ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆72Updated last month
- ☆63Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆100Updated 4 months ago
- Cataloging released Triton kernels.☆265Updated 2 months ago
- torchcomms: a modern PyTorch communications API☆245Updated this week
- ☆112Updated last year