opallab / positional_attentionLinks
Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"
☆14Updated last week
Alternatives and similar repositories for positional_attention
Users that are interested in positional_attention are comparing it to the libraries listed below
Sorting:
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆19Updated last week
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated 2 years ago
- ☆31Updated 7 months ago
- ☆53Updated 8 months ago
- A simple example of VAEs with KANs☆12Updated last year
- 🧮 Algebraic Positional Encodings.☆13Updated 4 months ago
- ☆11Updated 3 months ago
- Code for the paper "Function-Space Learning Rates"☆20Updated last month
- ☆21Updated 8 months ago
- Deep Networks Grok All the Time and Here is Why☆35Updated last year
- Minimum Description Length probing for neural network representations☆19Updated 4 months ago
- ☆31Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year
- A scalable implementation of diffusion and flow-matching with XGBoost models, applied to calorimeter data.☆18Updated 7 months ago
- The Energy Transformer block, in JAX☆56Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated 2 months ago
- Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and visi…☆27Updated 3 months ago
- Repository for Sparse Universal Transformers☆18Updated last year
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆40Updated last year
- Your favourite classical machine learning algos on the GPU/TPU☆20Updated 5 months ago
- Efficient Scaling laws and collaborative pretraining.☆16Updated 4 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Updated 4 months ago
- Implementation of Spectral State Space Models☆16Updated last year
- Fork of Flame repo for training of some new stuff in development☆13Updated last week
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 8 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 9 months ago
- Official code for the paper "Attention as a Hypernetwork"☆36Updated 11 months ago
- ☆19Updated 2 months ago
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆21Updated 2 months ago
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Updated 4 years ago