opallab / positional_attention
Source code for the paper "Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning"
☆14Updated last month
Alternatives and similar repositories for positional_attention:
Users that are interested in positional_attention are comparing it to the libraries listed below
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated last year
- ☆52Updated 5 months ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated last week
- Implementation of Spectral State Space Models☆16Updated last year
- ☆31Updated 10 months ago
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆41Updated last year
- Minimum Description Length probing for neural network representations☆19Updated last month
- Official code for the paper "Attention as a Hypernetwork"☆25Updated 8 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- An annotated implementation of the Hyena Hierarchy paper☆32Updated last year
- ☆30Updated 5 months ago
- ☆30Updated 4 months ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32Updated 9 months ago
- JAX/Flax implementation of the Hyena Hierarchy☆34Updated last year
- Your favourite classical machine learning algos on the GPU/TPU☆20Updated 2 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆53Updated 7 months ago
- Repository for Sparse Universal Transformers☆17Updated last year
- A simple example of VAEs with KANs☆12Updated 10 months ago
- ☆11Updated 10 months ago
- Official Code Repository for the paper "Key-value memory in the brain"☆24Updated 3 weeks ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆23Updated 7 months ago
- Efficient Scaling laws and collaborative pretraining.☆15Updated last month
- Deep Networks Grok All the Time and Here is Why☆30Updated 10 months ago
- Simple Scalable Discrete Diffusion for text in PyTorch☆33Updated 5 months ago
- Using FlexAttention to compute attention with different masking patterns☆42Updated 5 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated last week
- The Energy Transformer block, in JAX☆56Updated last year