axonn-ai / axonnLinks
A parallel framework for training deep neural networks
β60Updated 2 months ago
Alternatives and similar repositories for axonn
Users that are interested in axonn are comparing it to the libraries listed below
Sorting:
- extensible collectives library in tritonβ87Updated 2 months ago
- A bunch of kernels that might make stuff slower πβ46Updated this week
- Hydragen: High-Throughput LLM Inference with Shared Prefixesβ36Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!β44Updated this week
- β28Updated 4 months ago
- Effective transpose on Hopper GPUβ20Updated last month
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.β153Updated this week
- Experiment of using Tangent to autodiff tritonβ79Updated last year
- β93Updated last week
- β80Updated 6 months ago
- Sparsity support for PyTorchβ35Updated 2 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.β44Updated 10 months ago
- β13Updated 3 weeks ago
- MLPerfβ’ logging libraryβ36Updated last month
- PyTorch bindings for CUTLASS grouped GEMM.β93Updated last week
- β105Updated 9 months ago
- Collection of kernels written in Triton languageβ125Updated 2 months ago
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM trainingβ45Updated 2 weeks ago
- This repository contains the experimental PyTorch native float8 training UXβ223Updated 10 months ago
- A Python library transfers PyTorch tensors between CPU and NVMeβ116Updated 6 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.β127Updated this week
- Make triton easierβ47Updated 11 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ181Updated last month
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β133Updated last year
- Example ML projects that use the Determined library.β32Updated 8 months ago
- LLM training in simple, raw C/CUDAβ99Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.β74Updated 3 weeks ago
- β71Updated 2 months ago
- Memory Optimizations for Deep Learning (ICML 2023)β64Updated last year
- Triton-based implementation of Sparse Mixture of Experts.β217Updated 6 months ago