Repository for Sparse Universal Transformers
☆20Oct 23, 2023Updated 2 years ago
Alternatives and similar repositories for SUT
Users that are interested in SUT are comparing it to the libraries listed below
Sorting:
- ☆11Oct 11, 2023Updated 2 years ago
- lanmt ebm☆12Jun 19, 2020Updated 5 years ago
- Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)☆12Mar 7, 2024Updated last year
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…☆21Mar 15, 2025Updated 11 months ago
- Code and data for paper "(How) do Language Models Track State?"☆20Mar 31, 2025Updated 11 months ago
- Read and write Numpy .npy and .npz files.☆18Sep 19, 2023Updated 2 years ago
- Wrapper to easily generate the chat template for Llama2☆65Mar 10, 2024Updated last year
- Array slices for Common Lisp☆18May 14, 2021Updated 4 years ago
- A simple wrapper around CFFI to enable contiguously allocated arrays of structures in Common Lisp.☆19Sep 21, 2023Updated 2 years ago
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence☆58Nov 11, 2025Updated 3 months ago
- A modular project skeleton generator☆21Mar 7, 2023Updated 2 years ago
- ☆24Sep 25, 2024Updated last year
- Neuralisp is a modular machine learning framework for Common Lisp, focused on deep learning models. It offers a high-performance tensor l…☆21Nov 10, 2025Updated 3 months ago
- ☆29Jul 9, 2024Updated last year
- The Energy Transformer block, in JAX☆64Dec 14, 2023Updated 2 years ago
- Official Code Repository for the paper "Key-value memory in the brain"☆31Feb 25, 2025Updated last year
- ☆35Apr 12, 2024Updated last year
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆40Dec 2, 2023Updated 2 years ago
- Code for reproducing the paper "Neural Networks Fail to Learn Periodic Functions and How to Fix It" as part of the ML Reproducibility Cha…☆11Apr 16, 2021Updated 4 years ago
- Official code for the paper "Attention as a Hypernetwork"☆51Feb 24, 2026Updated last week
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…