albanD / subclass_zooLinks

☆189

Alternatives and similar repositories for subclass_zoo

Users that are interested in subclass_zoo are comparing it to the libraries listed below

Sorting:

pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆744Updated this week
facebookresearch / HolisticTraceAnalysis
A library to analyze PyTorch traces.
☆462Updated last week
cchan / tccl
extensible collectives library in triton
☆95Updated 10 months ago
google / aqt
☆344Updated this week
facebookexperimental / triton
Github mirror of trition-lang/triton repo.
☆128Updated this week
Deep-Learning-Profiling-Tools / triton-viz
☆288Updated this week
google-research / sputnik
A library of GPU kernels for sparse matrix operations.
☆283Updated 5 years ago
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆315Updated 5 months ago
ColfaxResearch / cutlass-kernels
☆261Updated last year
Dao-AILab / quack
A Quirky Assortment of CuTe Kernels
☆798Updated this week
gpu-mode / triton-index
Cataloging released Triton kernels.
☆292Updated 5 months ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆178Updated 2 weeks ago
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆226Updated last year
yifuwang / symm-mem-recipes
☆159Updated last year
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆326Updated this week
openxla / shardy
MLIR-based partitioning system
☆164Updated this week
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆429Updated last week
meta-pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆164Updated last month
hidet-org / hidet
An open-source efficient deep learning framework/compiler, written in python.
☆739Updated 5 months ago
pytorch / rfcs
PyTorch RFCs (experimental)
☆138Updated 8 months ago
parasj / checkmate
Training neural networks in TensorFlow 2.0 with 5x less memory
☆137Updated 3 years ago
spcl / substation
Research and development for optimizing transformers
☆131Updated 4 years ago
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆326Updated 2 months ago
facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆115Updated last year
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆155Updated 2 years ago
awslabs / raf
☆145Updated last year
NVIDIA / Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆379Updated this week
Jokeren / triton-samples
☆28Updated last year
pytorch / tensorpipe
A tensor-aware point-to-point communication primitive for machine learning
☆284Updated last month
NVIDIA / nsight-python
Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools
☆111Updated last week