HazyResearch / flash-fft-convLinks

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

☆329

Alternatives and similar repositories for flash-fft-conv

Users that are interested in flash-fft-conv are comparing it to the libraries listed below

Sorting:

proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆189Updated last year
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆223Updated last year
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆580Updated 2 months ago
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 3 months ago
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆542Updated 5 months ago
google / aqt
☆335Updated last month
KellerJordan / cifar10-airbench
CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds
☆320Updated 3 months ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆337Updated last month
nanowell / AdEMAMix-Optimizer-Pytorch
The AdEMAMix Optimizer: Better, Faster, Older.
☆186Updated last year
srush / annotated-mamba
Annotated version of the Mamba paper
☆490Updated last year
NX-AI / flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
☆103Updated last week
Dao-AILab / fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
☆253Updated last week
meta-pytorch / attention-gym
Helpful tools and examples for working with flex-attention
☆1,029Updated last week
pbelcak / fastfeedforward
A repository for log-time feedforward networks
☆222Updated last year
facebookresearch / spdl
Scalable and Performant Data Loading
☆311Updated last week
lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆291Updated 4 months ago
epfml / dynamic-sparse-flash-attention
☆149Updated 2 years ago
HomebrewML / HeavyBall
Efficient optimizers
☆275Updated 2 weeks ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆159Updated 6 months ago
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆246Updated 3 weeks ago
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆276Updated 3 years ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆241Updated 4 months ago
tensorgi / TPA
[NeurIPS 2025 Spotlight] TPA: Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)
☆401Updated this week
gpu-mode / triton-index
Cataloging released Triton kernels.
☆263Updated last month
lucidrains / memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆383Updated 2 years ago
lucidrains / adam-atan2-pytorch
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆132Updated 2 weeks ago
apple / ml-sigmoid-attention
☆302Updated 6 months ago
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆385Updated last week
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆158Updated 2 years ago
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆71Updated 7 months ago