nod-ai / techtalksLinks

☆15

Alternatives and similar repositories for techtalks

Users that are interested in techtalks are comparing it to the libraries listed below

Sorting:

ColfaxResearch / cutlass-kernels
☆229Updated last year
pranjalssh / fast.cu
Fastest kernels written from scratch
☆318Updated 5 months ago
cchan / tccl
extensible collectives library in triton
☆88Updated 5 months ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆218Updated last year
Jokeren / triton-samples
☆28Updated 7 months ago
albanD / subclass_zoo
☆174Updated last year
triton-lang / kernels
☆88Updated 9 months ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆152Updated 4 months ago
meta-pytorch / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆45Updated 2 weeks ago
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆113Updated last year
wzsh / wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆139Updated 5 years ago
sunlex0717 / DissectingTensorCores
☆106Updated last year
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆275Updated this week
gpu-mode / triton-index
Cataloging released Triton kernels.
☆252Updated 7 months ago
uwsampl / SparseTIR
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆138Updated 2 years ago
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆95Updated 2 months ago
ColfaxResearch / cfx-article-src
☆136Updated 3 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆234Updated 2 weeks ago
pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆274Updated this week
openxla / shardy
MLIR-based partitioning system
☆125Updated this week
google-research / sputnik
A library of GPU kernels for sparse matrix operations.
☆271Updated 4 years ago
NVIDIA / Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆351Updated this week
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆214Updated 3 months ago
facebookexperimental / triton
Github mirror of trition-lang/triton repo.
☆54Updated this week
openxla / community
Stores documents and resources used by the OpenXLA developer community
☆128Updated last year
leimao / CUTLASS-Examples
CUTLASS and CuTe Examples
☆72Updated last month
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆214Updated this week
Jokeren / GPA
GPU Performance Advisor
☆66Updated 3 years ago
roastduck / FreeTensor
A language and compiler for irregular tensor programs.
☆149Updated 9 months ago
microsoft / SparTA
☆150Updated last year