jopetty/word-problem

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jopetty/word-problem)

jopetty / word-problem

Experiments on the impact of depth in transformers and SSMs.

☆44

Alternatives and similar repositories for word-problem

Users that are interested in word-problem are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

automl / unlocking_state_tracking
View on GitHub
Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…
☆22Mar 15, 2025Updated last year
automl / DeltaProduct
View on GitHub
DeltaProduct is a new linear recurrent neural network architecture that uses products of generalized Householder matrices as state-transi…
☆15Oct 13, 2025Updated 9 months ago
johanwind / wind_rwkv
View on GitHub
☆27Feb 26, 2026Updated 4 months ago
google-deepmind / spectral_ssm
View on GitHub
☆35Apr 12, 2024Updated 2 years ago
mayank31398 / ladder-residual-inference
View on GitHub
☆14Jul 13, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
emalach / LinearLM
View on GitHub
Code for the paper: https://arxiv.org/pdf/2309.06979.pdf
☆21Jul 29, 2024Updated last year
maximzubkov / fft-scan
View on GitHub
Efficient PScan implementation in PyTorch
☆17Jan 2, 2024Updated 2 years ago
deep-spin / sparse-communication
View on GitHub
☆12Mar 7, 2022Updated 4 years ago
GuoTianYu2000 / Active-Dormant-Attention
View on GitHub
codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"
☆11Dec 30, 2024Updated last year
Doraemonzzz / nanoTransNormer
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
HazyResearch / train-tk
View on GitHub
train with kittens!
☆66Oct 25, 2024Updated last year
bdusell / nondeterministic-stack-rnn
View on GitHub
Code for the paper "The Surprising Computational Power of Nondeterministic Stack RNNs" (DuSell and Chiang, 2023)
☆20Mar 21, 2024Updated 2 years ago
radarFudan / mamba-minimal-jax
View on GitHub
☆36Nov 22, 2024Updated last year
IBM / selective-dense-state-space-model
View on GitHub
Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …
☆16Sep 18, 2025Updated 10 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
samblouir / birdie
View on GitHub
☆15Jun 8, 2026Updated last month
kazuki-irie / kv-memory-brain
View on GitHub
Official Code Repository for the paper "Key-value memory in the brain"
☆32Feb 25, 2025Updated last year
cryscan / web-rwkv-inspector
View on GitHub
☆12Dec 21, 2024Updated last year
smonsays / hypernetwork-attention
View on GitHub
Official code for the paper "Attention as a Hypernetwork"
☆58Feb 24, 2026Updated 4 months ago
LeC-Z / RWKV-nonogram
View on GitHub
A 20M RWKV v6 can do nonogram
☆13Oct 18, 2024Updated last year
james-oldfield / MxD
View on GitHub
[NeurIPS'25] Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
☆16May 28, 2025Updated last year
glassroom / heinsen_attention
View on GitHub
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆25Jun 6, 2024Updated 2 years ago
jungokasai / T2R
View on GitHub
☆14Nov 20, 2022Updated 3 years ago
vvvm23 / mamba-jax
View on GitHub
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆94Jan 25, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
berlino / gated_linear_attention
View on GitHub
☆107Mar 9, 2024Updated 2 years ago
thomasahle / cce
View on GitHub
Clustered Compositional Embeddings
☆13Oct 25, 2023Updated 2 years ago
Dao-AILab / gram-newton-schulz
View on GitHub
Fast Polar Decomposition for Muon
☆167Jul 2, 2026Updated 3 weeks ago
OpenNLPLab / ETSC-Exact-Toeplitz-to-SSM-Conversion
View on GitHub
[EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…
☆14Oct 17, 2023Updated 2 years ago
EleutherAI / rnngineering
View on GitHub
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆33May 25, 2024Updated 2 years ago
BlinkDL / LinearAttentionArena
View on GitHub
Here we will test various linear attention designs.
☆62Apr 25, 2024Updated 2 years ago
AlirezaMorsali / MLP-Attention
View on GitHub
☆17Dec 19, 2024Updated last year
aboustati / vargrad
View on GitHub
Code accompanying VarGrad: A Low-Variance Gradient Estimator for Variational Inference
☆12Oct 12, 2020Updated 5 years ago
jemisjoky / umps_code
View on GitHub
u-MPS implementation and experimentation code used in the paper Tensor Networks for Probabilistic Sequence Modeling (https://arxiv.org/ab…
☆19Jul 2, 2020Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
sustcsonglin / gated_linear_attention_layer
View on GitHub
☆32Jan 7, 2024Updated 2 years ago
berlino / seq_icl
View on GitHub
☆54May 20, 2024Updated 2 years ago
OliverSieberling / dynamic-conv1d
View on GitHub
Triton kernels for dynamic causal short convolutions.
☆24Jun 4, 2026Updated last month
cjyaras / monarch-attention
View on GitHub
MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention (NeurIPS'25 Spotlight)
☆26Feb 22, 2026Updated 5 months ago
fla-org / hybrid-distillation
View on GitHub
☆34Dec 31, 2025Updated 6 months ago
Yifei-Zuo / Parallax
View on GitHub
Official repository for Parallax (Parameterized Local Linear Attention)
☆65Jul 7, 2026Updated 2 weeks ago
viking-sudo-rm / industrial-stacknns
View on GitHub
Stack neural networks applied to hefty natural language tasks.
☆15Dec 26, 2019Updated 6 years ago