opallab / positional_attentionLinks
Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"
☆14Updated 3 weeks ago
Alternatives and similar repositories for positional_attention
Users that are interested in positional_attention are comparing it to the libraries listed below
Sorting:
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated 2 years ago
- ☆53Updated 8 months ago
- An annotated implementation of the Hyena Hierarchy paper☆33Updated 2 years ago
- ☆11Updated 4 months ago
- 🧮 Algebraic Positional Encodings.☆14Updated 5 months ago
- Implementation of Spectral State Space Models☆16Updated last year
- ☆31Updated 8 months ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆19Updated last month
- Minimum Description Length probing for neural network representations☆18Updated 4 months ago
- ☆25Updated last month
- Official implementation of "BERTs are Generative In-Context Learners"☆28Updated 3 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated 3 months ago
- Repository for Sparse Universal Transformers☆18Updated last year
- ☆21Updated 8 months ago
- ☆32Updated last year
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆40Updated last year
- Code for the paper "Function-Space Learning Rates"☆20Updated 3 weeks ago
- Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)☆15Updated 5 months ago
- Simple Scalable Discrete Diffusion for text in PyTorch☆33Updated 8 months ago
- ☆32Updated 8 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆38Updated 2 weeks ago
- Efficient Scaling laws and collaborative pretraining.☆16Updated 4 months ago
- Repo for solving arc problems with an Neural Cellular Automata☆16Updated last month
- Official Code Repository for the paper "Key-value memory in the brain"☆26Updated 4 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 9 months ago
- Deep Networks Grok All the Time and Here is Why☆37Updated last year
- A simple example of VAEs with KANs☆12Updated last year
- ☆19Updated 3 months ago
- Implementation for robust ViT and scaled attention☆19Updated 2 months ago
- ☆11Updated last year