opallab / positional_attention
Source code for the paper "Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning"
☆14Updated 2 months ago
Alternatives and similar repositories for positional_attention:
Users that are interested in positional_attention are comparing it to the libraries listed below
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated last year
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated last month
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆40Updated last year
- ☆31Updated 5 months ago
- ☆52Updated 6 months ago
- 🧮 Algebraic Positional Encodings.☆11Updated 3 months ago
- ☆31Updated 6 months ago
- ☆21Updated 6 months ago
- ☆31Updated last year
- ☆11Updated last month
- Deep Networks Grok All the Time and Here is Why☆33Updated 10 months ago
- A scalable implementation of diffusion and flow-matching with XGBoost models, applied to calorimeter data.☆18Updated 5 months ago
- Minimum Description Length probing for neural network representations☆19Updated 2 months ago
- Implementation of Spectral State Space Models☆16Updated last year
- Code for the paper "Function-Space Learning Rates"☆18Updated last month
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆23Updated 2 months ago
- An annotated implementation of the Hyena Hierarchy paper☆32Updated last year
- Efficient Scaling laws and collaborative pretraining.☆16Updated 2 months ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32Updated 10 months ago
- A simple example of VAEs with KANs☆12Updated 10 months ago
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- Implementations of growing and pruning in neural networks☆22Updated last year
- Repository for the PopulAtion Parameter Averaging (PAPA) paper☆26Updated last year
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆20Updated last week
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆65Updated 6 months ago
- The Energy Transformer block, in JAX☆56Updated last year
- ☆31Updated 3 months ago
- ☆18Updated 9 months ago
- Stick-breaking attention☆50Updated last month