opallab / positional_attentionLinks

Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"

☆14

Alternatives and similar repositories for positional_attention

Users that are interested in positional_attention are comparing it to the libraries listed below

Sorting:

shikaiqiu / compute-better-spent
☆53Updated 10 months ago
AndyShih12 / LongHorizonTemperatureScaling
PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023
☆20Updated 2 years ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆54Updated last year
expz / annotated-hyena
An annotated implementation of the Hyena Hierarchy paper
☆33Updated 2 years ago
lucidrains / frame-averaging-pytorch
Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network
☆50Updated last year
bhoov / energy-transformer-jax
The Energy Transformer block, in JAX
☆59Updated last year
LiibanMo / scikit-jax
Your favourite classical machine learning algos on the GPU/TPU
☆20Updated 7 months ago
epfml / DenseFormer
☆81Updated last year
lucidrains / GAF-microbatch-pytorch
Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch
☆25Updated 6 months ago
edwardmilsom / function-space-learning-rates-paper
Code for the paper "Function-Space Learning Rates"
☆23Updated 2 months ago
AhmedImtiazPrio / grok-adversarial
Deep Networks Grok All the Time and Here is Why
☆37Updated last year
google-deepmind / spectral_ssm
☆33Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆89Updated last year
GFNOrg / GFlowNet-EM
Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.
☆41Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆101Updated 7 months ago
lucidrains / quartic-transformer
Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)
☆52Updated 4 months ago
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆85Updated last year
lindermanlab / elk
Scalable and Stable Parallelization of Nonlinear RNNS
☆17Updated 6 months ago
ruke1ire / RTF
A State-Space Model with Rational Transfer Function Representation.
☆79Updated last year
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆100Updated 11 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆81Updated 9 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
flukeskywalker / nanoDD
Simple Scalable Discrete Diffusion for text in PyTorch
☆34Updated 10 months ago
tml-epfl / why-weight-decay
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆66Updated 10 months ago
AndPotap / einsum-search
☆32Updated 10 months ago
lucidrains / scaling-vin-pytorch
Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group
☆36Updated 10 months ago
shawntan / SUT
Repository for Sparse Universal Transformers
☆19Updated last year
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated 11 months ago
IdoAmos / not-from-scratch
☆33Updated 9 months ago
dvruette / barrel-rec-pytorch
☆53Updated last year