dtreai / Griffin-Jax

Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"

☆13

Alternatives and similar repositories for Griffin-Jax:

Users that are interested in Griffin-Jax are comparing it to the libraries listed below

sustcsonglin / mamba-triton
☆46Updated 11 months ago
berlino / seq_icl
☆51Updated 7 months ago
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆62Updated 8 months ago
sjelassi / transformers_ssm_copy
☆29Updated 10 months ago
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆25Updated 9 months ago
abhishekpanigrahi1996 / transformer_in_transformer
☆44Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆46Updated last year
HazyResearch / prefix-linear-attention
☆47Updated 6 months ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆36Updated last year
kazuki-irie / kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain"
☆20Updated last week
srush / mamba-primer
☆37Updated 9 months ago
srush / mamba-scans
Blog post
☆16Updated 11 months ago
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆35Updated last year
IdoAmos / not-from-scratch
☆26Updated 2 months ago
Doraemonzzz / hgru2-pytorch
☆24Updated 3 months ago
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆40Updated 3 months ago
shawntan / stickbreaking-attention
Stick-breaking attention
☆41Updated last week
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆66Updated 9 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆67Updated 2 months ago
RobertCsordas / moeut
☆69Updated 5 months ago
google-deepmind / spectral_ssm
☆31Updated 9 months ago
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated 7 months ago
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆21Updated 2 months ago
gregorbachmann / Next-Token-Failures
☆78Updated 10 months ago
radarFudan / mamba-minimal-jax
☆31Updated last month
google-deepmind / randomized_positional_encodings
Randomized Positional Encodings Boost Length Generalization of Transformers
☆79Updated 10 months ago
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆52Updated 5 months ago
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆15Updated last year
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆74Updated 7 months ago
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆28Updated 2 months ago