borjanG / 2023-transformers-rotfLinks
Codes for the paper "A mathematical perspective on Transformers".
☆37Updated last year
Alternatives and similar repositories for 2023-transformers-rotf
Users that are interested in 2023-transformers-rotf are comparing it to the libraries listed below
Sorting:
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆84Updated last year
- Brain-Inspired Modular Training (BIMT), a method for making neural networks more modular and interpretable.☆172Updated 2 years ago
- The Energy Transformer block, in JAX☆59Updated last year
- A State-Space Model with Rational Transfer Function Representation.☆79Updated last year
- Code for Discovering Preference Optimization Algorithms with and for Large Language Models☆63Updated last year
- ☆51Updated last year
- LoRA for arbitrary JAX models and functions☆140Updated last year
- This repository contains the official code for Energy Transformer---an efficient Energy-based Transformer variant for graph classificatio…☆24Updated last year
- About A collection of AWESOME things about information geometry Topics☆164Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆123Updated 7 months ago
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆77Updated 3 years ago
- ☆39Updated 3 years ago
- ☆200Updated 7 months ago
- Open source code for EigenGame.☆30Updated 2 years ago
- ☆53Updated last year
- ☆53Updated 9 months ago
- 🧱 Modula software package☆207Updated 3 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Updated 5 months ago
- An AI benchmark for creative, human-like problem solving using Sudoku variants☆77Updated 2 months ago
- ☆32Updated last year
- Scalable and Stable Parallelization of Nonlinear RNNS☆17Updated 5 months ago
- ☆111Updated last month
- ☆32Updated 9 months ago
- Omnigrok: Grokking Beyond Algorithmic Data☆58Updated 2 years ago
- Meta-Learning for Compositionality (MLC) for modeling human behavior☆142Updated last year
- ☆167Updated 3 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆99Updated 11 months ago
- A simple library for scaling up JAX programs☆139Updated 8 months ago
- Brain-like variational inference☆55Updated 2 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆91Updated 4 months ago