srush / annotated-s4
Implementation of https://srush.github.io/annotated-s4
☆487Updated 2 years ago
Alternatives and similar repositories for annotated-s4:
Users that are interested in annotated-s4 are comparing it to the libraries listed below
- ☆287Updated 3 months ago
- Annotated version of the Mamba paper☆478Updated last year
- ☆175Updated 10 months ago
- Sequence modeling with Mega.☆295Updated 2 years ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆661Updated 4 months ago
- For optimization algorithm research and development.☆504Updated this week
- Structured state space sequence models☆2,601Updated 8 months ago
- Language Modeling with the H3 State Space Model☆519Updated last year
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆564Updated this week
- Accelerated First Order Parallel Associative Scan☆180Updated 7 months ago
- Implementation of Block Recurrent Transformer - Pytorch☆218Updated 7 months ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆203Updated last year
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆112Updated 5 months ago
- Long Range Arena for Benchmarking Efficient Transformers☆749Updated last year
- ☆344Updated 11 months ago
- Reading list for research topics in state-space models☆274Updated 2 months ago
- ☆164Updated 2 years ago
- ☆172Updated 4 months ago
- ☆255Updated 2 years ago
- Unofficial JAX implementations of deep learning research papers☆154Updated 2 years ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆325Updated 9 months ago
- Code for our NeurIPS 2022 paper☆367Updated 2 years ago
- ☆165Updated last year
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆374Updated last year
- ☆215Updated 8 months ago
- Helpful tools and examples for working with flex-attention☆707Updated this week
- Named tensors with first-class dimensions for PyTorch☆319Updated last year
- VQVAEs, GumbelSoftmaxes and friends☆559Updated 3 years ago
- A curated list for awesome discrete diffusion models resources.☆285Updated 2 months ago
- Neural Networks and the Chomsky Hierarchy☆205Updated 11 months ago