sustcsonglin / mamba-tritonLinks

☆50

Alternatives and similar repositories for mamba-triton

Users that are interested in mamba-triton are comparing it to the libraries listed below

Sorting:

HazyResearch / prefix-linear-attention
☆57Updated last year
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆66Updated last year
berlino / seq_icl
☆53Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆17Updated last year
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆38Updated last month
shawntan / stickbreaking-attention
Stick-breaking attention
☆61Updated 5 months ago
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
dtunai / Griffin-Jax
Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
☆15Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
srush / mamba-scans
Blog post
☆17Updated last year
proger / nanokitchen
Parallel Associative Scan for Language Models
☆18Updated last year
kazuki-irie / kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain"
☆29Updated 9 months ago
srush / mamba-primer
☆38Updated last year
sjelassi / transformers_ssm_copy
☆35Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆85Updated last year
test-time-training / ttt-tk
☆41Updated last month
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
gregorbachmann / Next-Token-Failures
☆106Updated last year
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated last year
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆38Updated 7 months ago
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
RobertCsordas / moeut
☆89Updated last year
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆46Updated last year
google-deepmind / spectral_ssm
☆34Updated last year
automl / unlocking_state_tracking
Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…
☆17Updated 8 months ago
chijames / KERPLE
☆20Updated 3 years ago