proger / accelerated-scanLinks

Accelerated First Order Parallel Associative Scan

☆184

Alternatives and similar repositories for accelerated-scan

Users that are interested in accelerated-scan are comparing it to the libraries listed below

Sorting:

nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆90Updated this week
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆128Updated 3 weeks ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆221Updated 2 weeks ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆123Updated 7 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 7 months ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆238Updated last month
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
HazyResearch / flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
☆325Updated 7 months ago
nikhilvyas / SOAP
☆206Updated 8 months ago
mgmalek / efficient_cross_entropy
☆113Updated last year
HomebrewML / HeavyBall
Efficient optimizers
☆252Updated last week
epfml / dynamic-sparse-flash-attention
☆147Updated 2 years ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆165Updated last week
johnryan465 / pscan
☆40Updated last year
NX-AI / flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
☆93Updated last month
young-geng / scalax
A simple library for scaling up JAX programs
☆140Updated 9 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆149Updated last month
google-deepmind / nanodo
☆274Updated last year
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
modula-systems / modula
🧱 Modula software package
☆210Updated this week
KellerJordan / cifar10-airbench
CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds
☆274Updated 2 weeks ago
apple / ml-ademamix
☆64Updated 8 months ago
cloneofsimo / min-fsdp
☆82Updated last year
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆185Updated 8 months ago
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆140Updated last year
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆230Updated 8 months ago
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆88Updated last year
jax-ml / jax-llm-examples
☆137Updated last week
lucidrains / flash-attention-jax
Implementation of Flash Attention in Jax
☆215Updated last year