borjanG / 2023-transformersLinks

Codes for the paper The emergence of clusters in self-attention dynamics.

☆16

Alternatives and similar repositories for 2023-transformers

Users that are interested in 2023-transformers are comparing it to the libraries listed below

Sorting:

formll / dog
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
☆63Updated last year
mkhodak / relax
☆15Updated 3 years ago
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
teddykoker / grokking
PyTorch implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"
☆37Updated 3 years ago
davzha / multiset-equivariance
☆13Updated 3 years ago
jmtomczak / git_flow
General Invertible Transformations for Flow-based Generative Models
☆18Updated 4 years ago
remilepriol / causal-adaptation-speed
Investigate the speed of adaptation of structural causal models
☆15Updated 4 years ago
EleutherAI / features-across-time
Understanding how features learned by neural networks evolve throughout training
☆36Updated 8 months ago
dylandoblar / noether-networks
Meta-learning inductive biases in the form of useful conserved quantities.
☆37Updated 2 years ago
stanislavfort / dissect-git-re-basin
Replicating and dissecting the git-re-basin project in one-click-replication Colabs
☆36Updated 2 years ago
ejmichaud / grokking-squared
☆26Updated 2 years ago
yizhangzzz / transformers-lego
☆18Updated 2 years ago
bhoov / energy-transformer-jax
The Energy Transformer block, in JAX
☆59Updated last year
sjunhongshen / DASH
☆23Updated 2 years ago
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆84Updated last year
jxbz / entropix
📰 Computing the information content of trained neural networks
☆21Updated 3 years ago
AndPotap / einsum-search
☆32Updated 9 months ago
thjashin / multires-conv
Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)
☆124Updated last year
IDSIA / neuraldiffeq-fwp
Official repository for the paper "Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules" (…
☆23Updated last month
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆63Updated last year
ExtensityAI / benchmark
Evaluation of neuro-symbolic engines
☆38Updated 11 months ago
srush / mamba-scans
Blog post
☆17Updated last year
KindXiaoming / Omnigrok
Omnigrok: Grokking Beyond Algorithmic Data
☆58Updated 2 years ago
nick11roberts / XD
☆12Updated 3 years ago
aks2203 / deep-thinking
A centralized place for deep thinking code and experiments
☆85Updated last year
KhoomeiK / complexity-scaling
gzip Predicts Data-dependent Scaling Laws
☆35Updated last year
ganguli-lab / degrees-of-freedom
☆37Updated 3 years ago
ethancaballero / broken_neural_scaling_laws
Code Release for "Broken Neural Scaling Laws" (BNSL) paper
☆59Updated last year
runame / laplace-redux
Laplace Redux -- Effortless Bayesian Deep Learning
☆42Updated last month
akandykeller / NeuralWaveMachines
Official Implementation of the ICML 2023 paper: "Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally …
☆72Updated 2 years ago