kklemon / FlashPerceiverLinks
Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.
β26Updated 9 months ago
Alternatives and similar repositories for FlashPerceiver
Users that are interested in FlashPerceiver are comparing it to the libraries listed below
Sorting:
- πSmall Batch Size Training for Language Modelsβ43Updated this week
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonettoβ56Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ129Updated 8 months ago
- Efficient World Models with Context-Aware Tokenization. ICML 2024β105Updated 11 months ago
- Ο-GPT: A New Approach to Autoregressive Modelsβ67Updated last year
- β53Updated last year
- β32Updated last year
- Explorations into the recently proposed Taylor Series Linear Attentionβ100Updated last year
- β115Updated 2 months ago
- [ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token predictionβ65Updated 3 months ago
- Implementation of a framework for Genie2 in Pytorchβ151Updated 7 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorchβ25Updated 7 months ago
- β69Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ131Updated last year
- β82Updated last year
- Implementation of the proposed Spline-Based Transformer from Disney Researchβ102Updated 9 months ago
- Exploration into the Firefly algorithm in Pytorchβ40Updated 6 months ago
- H-Net Dynamic Hierarchical Architectureβ77Updated last month
- β34Updated last year
- β237Updated 2 months ago
- A basic pure pytorch implementation of flash attentionβ16Updated 9 months ago
- Explorations into whether a transformer with RL can direct a genetic algorithm to converge fasterβ70Updated 3 months ago
- Official PyTorch Implementation of the Longhorn Deep State Space Modelβ54Updated 8 months ago
- Implementation of a transformer for reinforcement learning using `x-transformers`β66Updated 3 weeks ago
- Focused on fast experimentation and simplicityβ76Updated 8 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAIβ289Updated 2 months ago
- Code for the paper "Function-Space Learning Rates"β23Updated 2 months ago
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's groupβ36Updated 11 months ago
- β31Updated 9 months ago
- Normalized Transformer (nGPT)β186Updated 9 months ago