glassroom / heinsen_routingLinks
Reference implementation of "An Algorithm for Routing Vectors in Sequences" (Heinsen, 2022) and "An Algorithm for Routing Capsules in All Domains" (Heinsen, 2019), for composing deep neural networks.
☆172Updated 2 years ago
Alternatives and similar repositories for heinsen_routing
Users that are interested in heinsen_routing are comparing it to the libraries listed below
Sorting:
- Official repository for the paper "A Modern Self-Referential Weight Matrix That Learns to Modify Itself" (ICML 2022 & NeurIPS 2021 Deep R…☆173Updated 3 months ago
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆94Updated 9 months ago
- ☆254Updated 2 years ago
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …☆115Updated last year
- A case study of efficient training of large language models using commodity hardware.☆68Updated 3 years ago
- TART: A plug-and-play Transformer module for task-agnostic reasoning☆201Updated 2 years ago
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …☆634Updated 2 years ago
- a curated list of data for reasoning ai☆137Updated last year
- An interpreter for RASP as described in the ICML 2021 paper "Thinking Like Transformers"☆320Updated last year
- Use context-free grammars with an LLM☆173Updated last year
- Controlled Text Generation via Language Model Arithmetic☆223Updated last year
- Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes☆241Updated 2 years ago
- Experiments for efforts to train a new and improved t5☆76Updated last year
- A pure NumPy implementation of Mamba.☆224Updated last year
- [NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"☆75Updated 2 months ago
- Fast Text Classification with Compressors dictionary☆150Updated 2 years ago
- Amos optimizer with JEstimator lib.☆82Updated last year
- LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence☆59Updated 3 years ago
- An implementation of the Nyströmformer, using Nystrom method to approximate standard self attention☆56Updated 3 years ago
- An interactive exploration of Transformer programming.☆269Updated last year
- A playground to make it easy to try crazy things☆33Updated 3 months ago
- ☆144Updated 2 years ago
- Automatic gradient descent☆210Updated 2 years ago
- A library for incremental loading of large PyTorch checkpoints☆56Updated 2 years ago
- The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).…☆121Updated 2 years ago
- A repository for log-time feedforward networks☆223Updated last year
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆137Updated last year
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT☆221Updated last year
- This repo contains code for the paper: "Can Foundation Models Help Us Achieve Perfect Secrecy?"☆24Updated 2 years ago
- Python Research Framework☆106Updated 2 years ago