opallab / positional_attention
Source code for the paper "Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning"
☆14Updated last month
Related projects ⓘ
Alternatives and complementary repositories for positional_attention
- ☆46Updated last month
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆19Updated last year
- ☆27Updated 7 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆52Updated last month
- An annotated implementation of the Hyena Hierarchy paper☆31Updated last year
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆20Updated this week
- Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network☆46Updated 3 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated 11 months ago
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆38Updated 9 months ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32Updated 5 months ago
- A State-Space Model with Rational Transfer Function Representation.☆70Updated 5 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆49Updated last year
- ☆30Updated 6 months ago
- Implementation of Spectral State Space Models☆17Updated 8 months ago
- The Energy Transformer block, in JAX☆50Updated 10 months ago
- ☆21Updated last month
- A simple example of VAEs with KANs☆12Updated 5 months ago
- Code repository for Trajectory Flow Matching☆23Updated last week
- Official code for the paper "Attention as a Hypernetwork"☆23Updated 4 months ago
- Your favourite classical machine learning algos on the GPU/TPU☆20Updated last month
- ☆24Updated last month
- This repository contains the official code for Energy Transformer---an efficient Energy-based Transformer variant for graph classificatio…☆20Updated 9 months ago
- ☆27Updated 5 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆49Updated 2 months ago
- ☆19Updated 11 months ago
- ☆29Updated last month
- Automatic Integration for Neural Spatio-Temporal Point Process models (AI-STPP) is a new paradigm for exact, efficient, non-parametric inf…☆24Updated 3 weeks ago
- Triton Implementation of HyperAttention Algorithm☆46Updated 11 months ago
- ☆50Updated last week
- Repository for Sparse Universal Transformers☆17Updated last year