facebookresearch / LAWT
Code for papers Linear Algebra with Transformers (TMLR) and What is my Math Transformer Doing? (AI for Maths Workshop, Neurips 2022)
☆61Updated last month
Related projects: ⓘ
- ICML 2022: Learning Iterative Reasoning through Energy Minimization☆42Updated last year
- ☆42Updated 3 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆84Updated 4 months ago
- Lightning-like training API for JAX with Flax☆28Updated 4 months ago
- Official source code for "Graph Neural Networks for Learning Equivariant Representations of Neural Networks". In ICLR 2024 (oral).☆63Updated last month
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk☆46Updated last year
- Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network☆45Updated last month
- An annotated implementation of the Hyena Hierarchy paper☆30Updated last year
- Fast training of unitary deep network layers from low-rank updates☆28Updated last year
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆74Updated 7 months ago
- ☆48Updated 3 months ago
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated last year
- ☆47Updated 3 months ago
- Code for our paper "Generative Flow Networks for Discrete Probabilistic Modeling"☆73Updated last year
- Use Jax functions in Pytorch☆224Updated last year
- Transformers with doubly stochastic attention☆40Updated 2 years ago
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆34Updated last year
- GflowNets, MCMC, Metropolis-Hasting, Gibbs sampling, Metropolis-adjusted Langevin, Inverse Transform Sampling, Acceptance-Rejection Metho…☆81Updated last year
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆38Updated 7 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆85Updated last month
- Differentiable Algorithms and Algorithmic Supervision.☆101Updated last year
- ☆65Updated 9 months ago
- Sequence Modeling with Structured State Spaces☆60Updated 2 years ago
- Flow-matching algorithms in JAX☆62Updated last month
- ☆32Updated 9 months ago
- This repository contains PyTorch implementations of various random feature maps for dot product kernels.☆17Updated 2 months ago
- JAX/Flax implementation of the Hyena Hierarchy☆29Updated last year
- ☆28Updated last week
- Deep Learning & Information Bottleneck☆45Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [arXiv, Oct 2023]☆41Updated 11 months ago