HazyResearch / flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
☆263Updated 7 months ago
Related projects: ⓘ
- Helpful tools and examples for working with flex-attention☆341Updated last month
- This repository contains the experimental PyTorch native float8 training UX☆210Updated last month
- Accelerated First Order Parallel Associative Scan☆151Updated 3 weeks ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆206Updated last month
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆278Updated 3 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆451Updated last month
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆283Updated 2 weeks ago
- A repository for log-time feedforward networks☆215Updated 5 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆452Updated last week
- Annotated version of the Mamba paper☆445Updated 6 months ago
- A library for unit scaling in PyTorch☆94Updated 2 weeks ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆87Updated 3 months ago
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆355Updated last year
- ☆129Updated last year
- Cataloging released Triton kernels.☆108Updated 3 weeks ago
- ☆247Updated this week
- ☆124Updated last week
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆156Updated this week
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆204Updated last year
- ☆187Updated last year
- Triton-based implementation of Sparse Mixture of Experts.☆166Updated 3 weeks ago
- Transformers with Arbitrarily Large Context☆613Updated last month
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆233Updated 4 months ago
- ☆278Updated 3 weeks ago
- 94% on CIFAR-10 in 3.09 seconds 💨 96% in 27 seconds☆127Updated last month
- The AdEMAMix Optimizer: Better, Faster, Older.☆132Updated last week
- A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to fac…☆212Updated last week
- ☆151Updated last year
- Reading list for research topics in state-space models☆209Updated last week
- Some preliminary explorations of Mamba's context scaling.☆184Updated 7 months ago