bobby-he / simplified_transformers
☆283Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for simplified_transformers
- Annotated version of the Mamba paper☆457Updated 8 months ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆246Updated 6 months ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆293Updated 5 months ago
- Reading list for research topics in state-space models☆241Updated 2 weeks ago
- Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent,…☆227Updated 8 months ago
- A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…☆100Updated 11 months ago
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆200Updated 5 months ago
- Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆335Updated last week
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆360Updated last year
- Helpful tools and examples for working with flex-attention☆469Updated 3 weeks ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆571Updated last week
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆49Updated last week
- ☆228Updated 2 months ago
- Understand and test language model architectures on synthetic tasks.☆162Updated 6 months ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆537Updated 6 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆256Updated 2 weeks ago
- ☆181Updated 11 months ago
- Implementation of Block Recurrent Transformer - Pytorch☆213Updated 3 months ago
- Sequence modeling with Mega.☆298Updated last year
- An implementation of local windowed attention for language modeling☆384Updated 2 months ago
- Some preliminary explorations of Mamba's context scaling.☆191Updated 9 months ago
- Official implementation of TransNormerLLM: A Faster and Better LLM☆229Updated 9 months ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆102Updated last month
- Code release for "Dropout Reduces Underfitting"☆312Updated last year
- Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces)☆149Updated 10 months ago
- ☆175Updated this week
- [ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"☆213Updated 3 weeks ago
- Collection of papers on state-space models☆556Updated 2 weeks ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆214Updated 3 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆103Updated 3 months ago