graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆124Updated 3 months ago
Alternatives and similar repositories for unit-scaling:
Users that are interested in unit-scaling are comparing it to the libraries listed below
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 7 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- ☆95Updated 9 months ago
- JAX bindings for Flash Attention v2☆88Updated 8 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago
- Accelerated First Order Parallel Associative Scan☆177Updated 7 months ago
- A simple library for scaling up JAX programs☆134Updated 4 months ago
- ☆220Updated last month
- ☆76Updated 8 months ago
- ☆157Updated last year
- Triton-based implementation of Sparse Mixture of Experts.☆208Updated 3 months ago
- seqax = sequence modeling + JAX☆150Updated this week
- supporting pytorch FSDP for optimizers☆79Updated 3 months ago
- extensible collectives library in triton☆84Updated 6 months ago
- ☆140Updated last year
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆104Updated this week
- ring-attention experiments☆127Updated 5 months ago
- Implementation of a Transformer, but completely in Triton☆260Updated 2 years ago
- Applied AI experiments and examples for PyTorch☆249Updated this week
- jax-triton contains integrations between JAX and OpenAI Triton☆386Updated last week
- ☆191Updated this week
- Implementation of Flash Attention in Jax☆206Updated last year
- ☆101Updated 6 months ago
- LoRA for arbitrary JAX models and functions☆135Updated last year
- The simplest but fast implementation of matrix multiplication in CUDA.☆34Updated 7 months ago
- ☆184Updated last month
- FlashRNN - Fast RNN Kernels with I/O Awareness☆76Updated this week
- ☆52Updated 5 months ago
- ☆290Updated this week
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆170Updated 3 months ago