axonn-ai / axonnLinks

A parallel framework for training deep neural networks

☆63

Alternatives and similar repositories for axonn

Users that are interested in axonn are comparing it to the libraries listed below

Sorting:

cchan / tccl
extensible collectives library in triton
☆88Updated 4 months ago
open-lm-engine / flash-model-architectures
A bunch of kernels that might make stuff slower 😉
☆56Updated last week
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
gpu-mode / ring-attention
ring-attention experiments
☆146Updated 9 months ago
stanford-futuredata / stk
☆107Updated 11 months ago
triton-lang / kernels
☆85Updated 9 months ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆48Updated this week
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆142Updated 4 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆207Updated this week
pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
☆289Updated 2 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆227Updated this week
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆152Updated last month
pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆212Updated this week
gpu-mode / triton-index
Cataloging released Triton kernels.
☆247Updated 6 months ago
HabanaAI / Megatron-DeepSpeed
Intel Gaudi's Megatron DeepSpeed Large Language Models for training
☆13Updated 7 months ago
deepspeedai / DeepSpeed-Kernels
☆74Updated 4 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
simveit / effective_transpose
Effective transpose on Hopper GPU
☆23Updated 3 months ago
ScalingIntelligence / hydragen
Hydragen: High-Throughput LLM Inference with Shared Prefixes
☆41Updated last year
Jokeren / triton-samples
☆28Updated 6 months ago
mayank31398 / ladder-residual-inference
☆14Updated 3 weeks ago
gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆33Updated 3 weeks ago
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆107Updated 2 months ago
pytorch-labs / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆199Updated this week
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆80Updated 11 months ago
meta-pytorch / kraken
Triton-based Symmetric Memory operators and examples
☆23Updated last week
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆79Updated 2 weeks ago
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆118Updated 8 months ago
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆230Updated 8 months ago
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆339Updated this week