HazyResearch / flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
☆307Updated 2 months ago
Alternatives and similar repositories for flash-fft-conv:
Users that are interested in flash-fft-conv are comparing it to the libraries listed below
- Accelerated First Order Parallel Associative Scan☆177Updated 7 months ago
- Helpful tools and examples for working with flex-attention☆695Updated last week
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 7 months ago
- ☆261Updated last month
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆523Updated last month
- Triton-based implementation of Sparse Mixture of Experts.☆208Updated 3 months ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆152Updated 10 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆223Updated last month
- Annotated version of the Mamba paper☆477Updated last year
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆506Updated 5 months ago
- ☆290Updated this week
- Scalable and Performant Data Loading☆230Updated this week
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆211Updated 2 years ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆233Updated this week
- Cataloging released Triton kernels.☆208Updated 2 months ago
- Some preliminary explorations of Mamba's context scaling.☆212Updated last year
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆221Updated 3 weeks ago
- ☆140Updated last year
- Applied AI experiments and examples for PyTorch☆250Updated this week
- ☆169Updated 3 months ago
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆372Updated last year
- When it comes to optimizers, it's always better to be safe than sorry☆214Updated last month
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆101Updated 3 months ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆651Updated 3 months ago
- Fast low-bit matmul kernels in Triton☆267Updated this week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆226Updated last month
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆321Updated 9 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆179Updated 6 months ago
- ☆191Updated this week
- ☆202Updated 2 years ago