buttercutter / Mamba_SSM
A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)
☆19Updated 7 months ago
Related projects: ⓘ
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year
- Simple notebooks to learn diffusion models on toy datasets☆17Updated last year
- ☆15Updated last year
- ☆29Updated last year
- Directed masked autoencoders☆13Updated last year
- This repository contains code for the MicroAdam paper.☆9Updated 2 months ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated 11 months ago
- Official code for the paper "Attention as a Hypernetwork"☆20Updated 2 months ago
- Using FlexAttention to compute attention with different masking patterns☆28Updated last week
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆56Updated 10 months ago
- PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation☆25Updated 2 years ago
- Linear Attention Sequence Parallelism (LASP)☆64Updated 3 months ago
- ☆38Updated 4 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆46Updated last month
- ☆19Updated last month
- Triton Implementation of HyperAttention Algorithm☆46Updated 9 months ago
- ☆11Updated last year
- ☆6Updated 9 months ago
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago
- PyTorch reimplementation of the paper "HyperMixer: An MLP-based Green AI Alternative to Transformers" [arXiv 2022].☆17Updated 2 years ago
- Hacks for PyTorch☆17Updated last year
- Toy genetic algorithm in Pytorch☆28Updated 6 months ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆47Updated 2 years ago
- ☆42Updated this week
- ☆30Updated 8 months ago
- ☆30Updated 3 months ago
- ☆48Updated 3 months ago
- Implementation of Metaformer, but in an autoregressive manner☆22Updated 2 years ago
- ☆41Updated 2 months ago
- Here we will test various linear attention designs.☆55Updated 4 months ago