kklemon / FlashPerceiverLinks

Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.

☆26

Alternatives and similar repositories for FlashPerceiver

Users that are interested in FlashPerceiver are comparing it to the libraries listed below

Sorting:

idiap / sigma-gpt
σ-GPT: A New Approach to Autoregressive Models
☆67Updated 11 months ago
google-deepmind / spectral_ssm
☆33Updated last year
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆100Updated 11 months ago
dvruette / barrel-rec-pytorch
☆53Updated last year
lucidrains / self-reasoning-tokens-pytorch
Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto
☆56Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆127Updated 8 months ago
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆36Updated last week
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆130Updated last year
main-horse / hnet
H-Net Dynamic Hierarchical Architecture
☆65Updated 2 weeks ago
lucidrains / GAF-microbatch-pytorch
Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch
☆25Updated 6 months ago
hyhieu / easy_pybind
☆32Updated last year
ChenWu98 / algorithmic-creativity
[ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
☆56Updated 2 months ago
lucidrains / firefly-torch
Exploration into the Firefly algorithm in Pytorch
☆40Updated 5 months ago
epfml / DenseFormer
☆81Updated last year
kvfrans / splus
☆115Updated last month
vmicheli / delta-iris
Efficient World Models with Context-Aware Tokenization. ICML 2024
☆105Updated 10 months ago
fal-ai / diffusion-speedrun
Focused on fast experimentation and simplicity
☆76Updated 7 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆185Updated 8 months ago
radarFudan / mamba-minimal-jax
☆31Updated 8 months ago
yilundu / ired_code_release
☆67Updated last year
cloneofsimo / zeroshampoo
☆34Updated 10 months ago
nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆91Updated last week
p-doom / jasmine
A simple, performant and scalable JAX-based world modeling codebase
☆58Updated this week
bluorion-com / ZClip
Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".
☆131Updated last month
lucidrains / spline-based-transformer
Implementation of the proposed Spline-Based Transformer from Disney Research
☆102Updated 8 months ago
dvruette / gidd
Code accompanying the paper "Generalized Interpolating Discrete Diffusion"
☆97Updated last month
AllanYangZhou / universal_neural_functional
☆51Updated last year
lucidrains / autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆44Updated 2 years ago
HanGuo97 / log-linear-attention
☆232Updated 2 months ago
cloneofsimo / scaling-guide
WIP
☆94Updated 11 months ago