IDSIA / modern-srwm
Official repository for the paper "A Modern Self-Referential Weight Matrix That Learns to Modify Itself" (ICML 2022 & NeurIPS 2021 Deep RL Workshop) and "Accelerating Neural Self-Improvement via Bootstrapping" (ICLR 2023 Workshop)
☆170Updated last year
Alternatives and similar repositories for modern-srwm:
Users that are interested in modern-srwm are comparing it to the libraries listed below
- The Energy Transformer block, in JAX☆56Updated last year
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆92Updated 4 months ago
- Easy Hypernetworks in Pytorch and Jax☆100Updated 2 years ago
- Hierarchical Associative Memory User Experience☆101Updated last year
- ☆246Updated 6 months ago
- Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.☆104Updated 3 years ago
- Gaussian-Bernoulli Restricted Boltzmann Machines☆104Updated 2 years ago
- The Abstraction and Reasoning Corpus made into a web game☆89Updated 7 months ago
- Official Implementation of the ICML 2023 paper: "Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally …☆70Updated last year
- Reference implementation of "An Algorithm for Routing Vectors in Sequences" (Heinsen, 2022) and "An Algorithm for Routing Capsules in All…☆169Updated last year
- Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper☆80Updated 3 years ago
- ☆50Updated 2 years ago
- Automatic gradient descent☆207Updated last year
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆203Updated last year
- Code for "Meta Learning Backpropagation And Improving It" @ NeurIPS 2021 https://arxiv.org/abs/2012.14905☆31Updated 3 years ago
- ☆17Updated 7 months ago
- Stochastic Automatic Differentiation library for PyTorch.☆198Updated 7 months ago
- Neural Networks and the Chomsky Hierarchy☆205Updated last year
- Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes☆238Updated last year
- Sequence Modeling with Structured State Spaces☆63Updated 2 years ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆171Updated this week
- Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers" (NeurIPS 2021)☆48Updated 2 years ago
- Running Jax in PyTorch Lightning☆93Updated 3 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆83Updated last year
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated 2 years ago
- ☆103Updated 3 years ago
- ☆192Updated 10 months ago
- Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions☆258Updated last year
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)☆187Updated 2 years ago
- Cellular Automata Accelerated in JAX (Oral at ICLR 2025)☆84Updated last week