facebookresearch / LAWT
Code for papers Linear Algebra with Transformers (TMLR) and What is my Math Transformer Doing? (AI for Maths Workshop, Neurips 2022)
☆67Updated 7 months ago
Alternatives and similar repositories for LAWT:
Users that are interested in LAWT are comparing it to the libraries listed below
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆71Updated 4 months ago
- ☆52Updated 5 months ago
- An annotated implementation of the Hyena Hierarchy paper☆32Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆63Updated 6 months ago
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk☆46Updated last year
- Experiment of using Tangent to autodiff triton☆78Updated last year
- My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other h…☆52Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- ☆51Updated 9 months ago
- Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network☆49Updated 8 months ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Updated last year
- ☆81Updated last year
- Official source code for "Graph Neural Networks for Learning Equivariant Representations of Neural Networks". In ICLR 2024 (oral).☆77Updated 8 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆108Updated 3 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Updated last year
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆35Updated 2 years ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆57Updated last year
- Lightning-like training API for JAX with Flax☆38Updated 3 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆95Updated 7 months ago
- Blog post☆17Updated last year
- ☆37Updated 11 months ago
- Official Code for Paper "Think While You Generate: Discrete Diffusion with Planned Denoising" [ICLR 2025]☆50Updated last month
- Code for NeurIPS 2024 paper: "Noether's razor: Learning Conserved Quantities" by Tycho F. A. van der Ouderaa, Mark van der Wilk, Pim de H…☆10Updated 5 months ago
- ☆45Updated last year
- ☆30Updated 5 months ago
- Repo to the paper "Lie Point Symmetry Data Augmentation for Neural PDE Solvers"☆49Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆58Updated 2 months ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated last year
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago