graphcore-research / unit-scalingLinks

A library for unit scaling in PyTorch

☆130

Alternatives and similar repositories for unit-scaling

Users that are interested in unit-scaling are comparing it to the libraries listed below

Sorting:

srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆223Updated last year
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆189Updated last year
mgmalek / efficient_cross_entropy
☆121Updated last year
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
cloneofsimo / min-fsdp
☆91Updated last year
google-research / jaxpruner
☆234Updated 8 months ago
young-geng / scalax
A simple library for scaling up JAX programs
☆144Updated 11 months ago
nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆97Updated this week
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆241Updated 4 months ago
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆45Updated last year
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆141Updated last year
MatX-inc / seqax
seqax = sequence modeling + JAX
☆167Updated 3 months ago
epfml / dynamic-sparse-flash-attention
☆149Updated 2 years ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆131Updated 10 months ago
lucidrains / flash-attention-jax
Implementation of Flash Attention in Jax
☆219Updated last year
HomebrewML / HeavyBall
Efficient optimizers
☆274Updated last week
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆276Updated 3 years ago
google / aqt
☆332Updated last month
ayaka14732 / jax-smi
JAX Synergistic Memory Inspector
☆179Updated last year
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆86Updated 3 weeks ago
shikaiqiu / compute-better-spent
☆58Updated last year
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆216Updated last year
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆246Updated 2 weeks ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆233Updated 3 weeks ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆166Updated 3 months ago
lixilinx / psgd_torch
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…
☆188Updated last week
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆69Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
insuhan / hyper-attn
☆83Updated last year