srush / annotated-s4Links
Implementation of https://srush.github.io/annotated-s4
☆495Updated 2 years ago
Alternatives and similar repositories for annotated-s4
Users that are interested in annotated-s4 are comparing it to the libraries listed below
Sorting:
- ☆290Updated 4 months ago
- Annotated version of the Mamba paper☆482Updated last year
- Sequence modeling with Mega.☆295Updated 2 years ago
- Long Range Arena for Benchmarking Efficient Transformers☆757Updated last year
- ☆178Updated last year
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆681Updated 6 months ago
- For optimization algorithm research and development.☆518Updated this week
- ☆352Updated last year
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆584Updated this week
- Transformer based on a variant of attention that is linear complexity in respect to sequence length☆768Updated last year
- ☆267Updated 10 months ago
- Structured state space sequence models☆2,633Updated 10 months ago
- Implementation of Block Recurrent Transformer - Pytorch☆217Updated 9 months ago
- Accelerated First Order Parallel Associative Scan☆180Updated 9 months ago
- Library for reading and processing ML training data.☆447Updated this week
- maximal update parametrization (µP)☆1,526Updated 10 months ago
- Named tensors with first-class dimensions for PyTorch☆329Updated last year
- Language Modeling with the H3 State Space Model☆518Updated last year
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆333Updated 11 months ago
- Code for our NeurIPS 2022 paper☆368Updated 2 years ago
- Implementation of the proposed minGRU in Pytorch☆296Updated 2 months ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆203Updated last year
- ☆256Updated 2 years ago
- Reading list for research topics in state-space models☆289Updated this week
- [ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)☆580Updated last year
- ☆163Updated 2 years ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆117Updated 7 months ago
- CLU lets you write beautiful training loops in JAX.☆343Updated last month
- ☆376Updated last year
- Universal Tensor Operations in Einstein-Inspired Notation for Python.☆374Updated last month