shawntan / SUT
Repository for Sparse Universal Transformers
☆17Updated last year
Related projects ⓘ
Alternatives and complementary repositories for SUT
- ☆24Updated last month
- ☆44Updated last year
- ☆50Updated 5 months ago
- ☆46Updated last month
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated 11 months ago
- ☆21Updated last month
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆49Updated last year
- ☆45Updated 9 months ago
- The Energy Transformer block, in JAX☆50Updated 11 months ago
- ☆22Updated this week
- ☆27Updated 7 months ago
- Stick-breaking attention☆33Updated this week
- Source code for the paper "Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning"☆14Updated last month
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆38Updated 9 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- ☆31Updated 10 months ago
- ☆21Updated 2 years ago
- Official implementation of the transformer (TF) architecture suggested in a paper entitled "Looped Transformers as Programmable Computers…☆22Updated last year
- An annotated implementation of the Hyena Hierarchy paper☆31Updated last year
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆35Updated 11 months ago
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆57Updated last year
- ☆50Updated 2 weeks ago
- ☆11Updated last year
- Efficient PScan implementation in PyTorch☆15Updated 10 months ago
- Official code for the paper "Attention as a Hypernetwork"☆23Updated 4 months ago
- Universal Neurons in GPT2 Language Models☆26Updated 5 months ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆19Updated last year
- Parallel Associative Scan for Language Models☆18Updated 10 months ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32Updated 5 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆24Updated 6 months ago