kklemon / FlashPerceiver
Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.
☆26Updated 4 months ago
Alternatives and similar repositories for FlashPerceiver:
Users that are interested in FlashPerceiver are comparing it to the libraries listed below
- Focused on fast experimentation and simplicity☆70Updated 3 months ago
- ☆32Updated 9 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆27Updated 2 months ago
- ☆53Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆125Updated 4 months ago
- ☆33Updated 2 months ago
- ☆30Updated 4 months ago
- σ-GPT: A New Approach to Autoregressive Models☆62Updated 7 months ago
- Exploration into the Firefly algorithm in Pytorch☆35Updated last month
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆23Updated 2 months ago
- My take on Flow Matching☆44Updated 2 months ago
- ☆58Updated 9 months ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆80Updated last week
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"☆69Updated 2 weeks ago
- ☆123Updated 3 weeks ago
- Code for the paper "Function-Space Learning Rates"☆17Updated last month
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆35Updated last month
- ☆27Updated 9 months ago
- Implementation of the proposed Spline-Based Transformer from Disney Research☆88Updated 4 months ago
- Explorations into improving ViTArc with Slot Attention☆39Updated 5 months ago
- research impl of Native Sparse Attention (2502.11089)☆53Updated last month
- A basic pure pytorch implementation of flash attention☆16Updated 5 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆26Updated 3 weeks ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆28Updated 3 weeks ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆50Updated 4 months ago
- ☆22Updated 9 months ago
- Official Implementation of NeurIPS'23 Paper "Cross-Episodic Curriculum for Transformer Agents"☆31Updated last year
- Collection of autoregressive model implementation☆83Updated last month
- JAX bindings for Flash Attention v2☆89Updated 8 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆44Updated 3 weeks ago