axonn-ai / axonnLinks
A parallel framework for training deep neural networks
☆63Updated 6 months ago
Alternatives and similar repositories for axonn
Users that are interested in axonn are comparing it to the libraries listed below
Sorting:
- extensible collectives library in triton☆87Updated 5 months ago
- A bunch of kernels that might make stuff slower 😉☆59Updated this week
- Triton-based Symmetric Memory operators and examples☆28Updated 3 weeks ago
- This repository contains the experimental PyTorch native float8 training UX☆224Updated last year
- ☆74Updated 5 months ago
- ☆111Updated last year
- ring-attention experiments☆152Updated 11 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆211Updated this week
- ☆88Updated 10 months ago
- How to ship your LLM generated kernels to PyTorch☆49Updated this week
- PyTorch bindings for CUTLASS grouped GEMM.☆116Updated 3 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆57Updated this week
- Collection of kernels written in Triton language☆154Updated 5 months ago
- Effective transpose on Hopper GPU☆23Updated last week
- Framework to reduce autotune overhead to zero for well known deployments.☆82Updated this week
- Triton-based implementation of Sparse Mixture of Experts.☆239Updated 3 weeks ago
- The evaluation framework for training-free sparse attention in LLMs☆93Updated 3 months ago
- ☆118Updated last year
- A Python library transfers PyTorch tensors between CPU and NVMe☆121Updated 9 months ago
- ☆14Updated 2 months ago
- Experiment of using Tangent to autodiff triton☆81Updated last year
- ☆28Updated 8 months ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆92Updated last week
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆41Updated last year
- Applied AI experiments and examples for PyTorch☆294Updated 3 weeks ago
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆68Updated 9 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆82Updated last year
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆221Updated this week
- ☆27Updated 2 years ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆141Updated last year