NVIDIA/Fuser

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA/Fuser)

NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

☆396

Alternatives and similar repositories for Fuser

Users that are interested in Fuser are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

csarofeen / pytorch
View on GitHub
Tensors and Dynamic neural networks in Python with strong GPU acceleration
☆27Apr 20, 2023Updated 3 years ago
NVIDIA / nvbench
View on GitHub
CUDA Kernel Benchmarking Library
☆900Updated this week
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,434Updated this week
NVIDIA / jitify
View on GitHub
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
☆573Sep 15, 2025Updated 10 months ago
NVIDIA / cccl
View on GitHub
CUDA Core Compute Libraries
☆2,431Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Lightning-AI / lightning-thunder
View on GitHub
PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily wri…
☆1,465Updated this week
NVIDIA / NVTX
View on GitHub
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…
☆544Updated this week
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Jan 28, 2025Updated last year
llvm / torch-mlir
View on GitHub
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
☆1,867Updated this week
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
pytorch / torchdynamo
View on GitHub
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
☆1,078Apr 17, 2024Updated 2 years ago
NVIDIA / tilus
View on GitHub
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
☆489Jul 5, 2026Updated 2 weeks ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,493Jul 11, 2026Updated last week
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,060Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
hidet-org / hidet
View on GitHub
An open-source efficient deep learning framework/compiler, written in python.
☆743Sep 4, 2025Updated 10 months ago
TiledTensor / TiledKernel
View on GitHub
TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.
☆19May 12, 2024Updated 2 years ago
NVIDIA / cub
View on GitHub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
☆1,840Oct 9, 2023Updated 2 years ago
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,550Jul 13, 2026Updated last week
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
microsoft / nnfusion
View on GitHub
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆1,002Sep 19, 2024Updated last year
facebookresearch / HolisticTraceAnalysis
View on GitHub
A library to analyze PyTorch traces.
☆535May 29, 2026Updated last month
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,983Updated this week
NVIDIA / CUDALibrarySamples
View on GitHub
CUDA Library Samples
☆2,463Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
tlc-pack / cutlass_fpA_intB_gemm
View on GitHub
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆96Jun 21, 2026Updated 3 weeks ago
openucx / ucc
View on GitHub
Unified Collective Communication Library
☆310Updated this week
ColfaxResearch / cutlass-kernels
View on GitHub
☆269Jul 11, 2024Updated 2 years ago
NVIDIA / MatX
View on GitHub
An efficient C++20 GPU numerical computing library with Python-like syntax
☆1,438Updated this week
uwsampl / SparseTIR
View on GitHub
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆145Mar 31, 2023Updated 3 years ago
NVIDIA / HMM_sample_code
View on GitHub
CUDA 12.2 HMM demos
☆21Jul 26, 2024Updated last year
meta-pytorch / tlparse
View on GitHub
TORCH_TRACE parser for PT2
☆90May 11, 2026Updated 2 months ago
meta-pytorch / MSLK
View on GitHub
MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…
☆121Updated this week
microsoft / msccl
View on GitHub
Microsoft Collective Communication Library
☆394Sep 20, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
openxla / stablehlo
View on GitHub
Backward compatible ML compute opset inspired by HLO/MHLO
☆672Updated this week
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,343Aug 28, 2025Updated 10 months ago
microsoft / mscclpp
View on GitHub
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆541Updated this week
NVIDIA / cudnn-frontend
View on GitHub
cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source …
☆884Updated this week
mirage-project / mirage
View on GitHub
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆2,377Updated this week
sail-sg / zero-bubble-pipeline-parallelism
View on GitHub
Zero Bubble Pipeline Parallelism
☆462May 7, 2025Updated last year
pytorch / helion
View on GitHub
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆909Updated this week