shawntan / SUTLinks

Repository for Sparse Universal Transformers

☆19

Alternatives and similar repositories for SUT

Users that are interested in SUT are comparing it to the libraries listed below

Sorting:

abhishekpanigrahi1996 / transformer_in_transformer
☆45Updated last year
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆54Updated last year
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆40Updated last year
berlino / seq_icl
☆53Updated last year
shawntan / stickbreaking-attention
Stick-breaking attention
☆59Updated last month
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated last month
machine-discovery / deer
Parallelizing non-linear sequential models over the sequence length
☆53Updated last month
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆81Updated 9 months ago
shikaiqiu / compute-better-spent
☆53Updated 10 months ago
sustcsonglin / mamba-triton
☆49Updated last year
AndPotap / einsum-search
☆32Updated 10 months ago
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
google-deepmind / spectral_ssm
☆33Updated last year
opallab / positional_attention
Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"
☆14Updated 2 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆32Updated 9 months ago
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated 11 months ago
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆11Updated 3 weeks ago
insuhan / hyper-attn
☆81Updated last year
RobertCsordas / moeut
☆83Updated 11 months ago
AndyShih12 / LongHorizonTemperatureScaling
PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023
☆20Updated 2 years ago
sjelassi / transformers_ssm_copy
☆33Updated last year
bentherien / mu_learned_optimization
[Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
☆13Updated 4 months ago
Cranial-XIX / longhorn
Official PyTorch Implementation of the Longhorn Deep State Space Model
☆54Updated 8 months ago
radarFudan / mamba-minimal-jax
☆31Updated 8 months ago
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆16Updated last year
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
Silent-Zebra / twisted-smc-lm
☆30Updated 4 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
bhoov / energy-transformer-jax
The Energy Transformer block, in JAX
☆59Updated last year