saifhaq / almaLinks

☆20

Alternatives and similar repositories for alma

Users that are interested in alma are comparing it to the libraries listed below

Sorting:

gpu-mode / profiling-cuda-in-torch
☆176Updated last year
mlops-discord / gpu-optimization-workshop
Slides, notes, and materials for the workshop
☆333Updated last year
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆580Updated 2 months ago
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆392Updated last week
gpu-mode / triton-index
Cataloging released Triton kernels.
☆264Updated 2 months ago
hkproj / triton-flash-attention
☆215Updated 10 months ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆161Updated 7 months ago
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆240Updated 6 months ago
hkproj / quantization-notes
Notes on quantization in neural networks
☆104Updated last year
Deep-Learning-Profiling-Tools / triton-viz
☆246Updated this week
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆302Updated 2 months ago
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆276Updated 3 years ago
gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆100Updated last month
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆223Updated last year
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆192Updated 2 years ago
srush / annotated-mamba
Annotated version of the Mamba paper
☆490Updated last year
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆427Updated 7 months ago
a-hamdi / GPU
100 days of building GPU kernels!
☆523Updated 6 months ago
google / aqt
☆337Updated this week
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆216Updated last week
hidet-org / hidet
An open-source efficient deep learning framework/compiler, written in python.
☆733Updated 2 months ago
evintunador / triton_docs_tutorials
making the official triton tutorials actually comprehensible
☆60Updated 2 months ago
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆195Updated 5 months ago
HazyResearch / flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
☆330Updated 10 months ago
LambdaLabsML / distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
☆530Updated 2 weeks ago
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆147Updated 2 years ago
huggingface / nn_pruning
Prune a model while finetuning or training.
☆405Updated 3 years ago
HazyResearch / aisys-building-blocks
Building blocks for foundation models.
☆567Updated last year
dropbox / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆886Updated 2 weeks ago
huggingface / optimum-quanto
A pytorch quantization backend for optimum
☆1,004Updated 2 weeks ago