graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆125Updated 4 months ago
Alternatives and similar repositories for unit-scaling:
Users that are interested in unit-scaling are comparing it to the libraries listed below
- Experiment of using Tangent to autodiff triton☆78Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 8 months ago
- ☆100Updated 10 months ago
- Accelerated First Order Parallel Associative Scan☆180Updated 7 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆211Updated 4 months ago
- A simple library for scaling up JAX programs☆134Updated 5 months ago
- ☆76Updated 9 months ago
- JAX bindings for Flash Attention v2☆89Updated 8 months ago
- LoRA for arbitrary JAX models and functions☆136Updated last year
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆108Updated this week
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆229Updated last month
- Implementation of Flash Attention in Jax☆206Updated last year
- supporting pytorch FSDP for optimizers☆80Updated 4 months ago
- extensible collectives library in triton☆84Updated last week
- ☆157Updated last year
- Implementation of a Transformer, but completely in Triton☆263Updated 3 years ago
- ☆186Updated this week
- seqax = sequence modeling + JAX☆153Updated this week
- ☆142Updated last year
- The simplest but fast implementation of matrix multiplication in CUDA.☆34Updated 8 months ago
- ☆103Updated 7 months ago
- Applied AI experiments and examples for PyTorch☆256Updated 3 weeks ago
- ☆224Updated 2 months ago
- Understand and test language model architectures on synthetic tasks.☆190Updated last month
- ring-attention experiments☆129Updated 5 months ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆168Updated 10 months ago
- ☆295Updated this week
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- If it quacks like a tensor...☆57Updated 5 months ago