radarFudan / mamba-minimal-jaxLinks

☆33

Alternatives and similar repositories for mamba-minimal-jax

Users that are interested in mamba-minimal-jax are comparing it to the libraries listed below

Sorting:

vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆88Updated last year
google-deepmind / spectral_ssm
☆34Updated last year
kvfrans / splus
☆120Updated 4 months ago
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆36Updated 11 months ago
machine-discovery / deer
Parallelizing non-linear sequential models over the sequence length
☆54Updated 4 months ago
subho406 / agalite
AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning (Published in TMLR)
☆21Updated last year
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆16Updated last year
shikaiqiu / compute-better-spent
☆58Updated last year
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated last month
sustcsonglin / mamba-triton
☆48Updated last year
proger / nanokitchen
Parallel Associative Scan for Language Models
☆17Updated last year
evanatyourservice / psgd_jax
Implementation of PSGD optimizer in JAX
☆35Updated 9 months ago
lucidrains / scaling-vin-pytorch
Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group
☆37Updated last year
johnryan465 / pscan
☆39Updated last year
srush / mamba-primer
☆38Updated last year
Cranial-XIX / longhorn
Official PyTorch Implementation of the Longhorn Deep State Space Model
☆55Updated 10 months ago
young-geng / scalax
A simple library for scaling up JAX programs
☆144Updated 11 months ago
automl / unlocking_state_tracking
Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…
☆17Updated 7 months ago
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
p-doom / jasmine
A simple, performant and scalable JAX-based world modeling codebase
☆76Updated this week
edwardmilsom / function-space-learning-rates-paper
Code for the paper "Function-Space Learning Rates"
☆23Updated 4 months ago
AllanYangZhou / universal_neural_functional
☆52Updated last year
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆63Updated 3 weeks ago
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆99Updated last year
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆141Updated last year
NicolasZucchet / minimal-LRU
Non official implementation of the Linear Recurrent Unit (LRU, Orvieto et al. 2023)
☆58Updated last month
Benjamin-Walker / selective-ssms-and-linear-cdes
Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)
☆15Updated 9 months ago
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆86Updated 3 weeks ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
FLAIROx / cultural-accumulation
☆13Updated last year