devvrit / SONewLinks

☆9

Alternatives and similar repositories for SONew

Users that are interested in SONew are comparing it to the libraries listed below

Sorting:

vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆84Updated last year
srush / mamba-scans
Blog post
☆17Updated last year
proger / nanokitchen
Parallel Associative Scan for Language Models
☆18Updated last year
lucidrains / token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆50Updated 3 years ago
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
IDSIA / recurrent-fwp
Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers" (NeurIPS 2021)
☆49Updated last month
ethancaballero / broken_neural_scaling_laws
Code Release for "Broken Neural Scaling Laws" (BNSL) paper
☆59Updated last year
Z-T-WANG / LaProp-Optimizer
Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"
☆29Updated 4 years ago
peerdavid / layerwise-batch-entropy
Layerwise Batch Entropy Regularization
☆23Updated 2 years ago
nikhilvyas / SOAP_MUON
Combining SOAP and MUON
☆16Updated 5 months ago
radarFudan / mamba-minimal-jax
☆31Updated 7 months ago
automl / unlocking_state_tracking
Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…
☆15Updated 4 months ago
irhum / hyena
JAX/Flax implementation of the Hyena Hierarchy
☆34Updated 2 years ago
google-research / precondition
☆31Updated last month
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆44Updated 9 months ago
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
srush / tangent
Source-to-Source Debuggable Derivatives in Pure Python
☆15Updated last year
optimizedlearning / mechanic
☆36Updated last year
ryoungj / mcbits
[ICML'21 Oral] Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding
☆14Updated 4 years ago
HazyResearch / prefix-linear-attention
☆56Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
google-deepmind / enn_acme
☆31Updated 2 years ago
SamsungSAILMontreal / nino
Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]
☆19Updated last month
AndyShih12 / LongHorizonTemperatureScaling
PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023
☆20Updated 2 years ago
samblouir / birdie
☆13Updated last month
teddykoker / learning-to-learn-jax
JAX implementation of Learning to learn by gradient descent by gradient descent
☆27Updated 9 months ago
ermongroup / SPN_Variational_Inference
PyTorch implementation for "Probabilistic Circuits for Variational Inference in Discrete Graphical Models", NeurIPS 2020
☆17Updated 3 years ago
jenni-ai / T2FW
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆19Updated 2 years ago
lucidrains / learning-to-expire-pytorch
An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain
☆34Updated 4 years ago
RobertCsordas / ndr
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".
☆33Updated last month