glassroom / heinsen_routingLinks
Reference implementation of "An Algorithm for Routing Vectors in Sequences" (Heinsen, 2022) and "An Algorithm for Routing Capsules in All Domains" (Heinsen, 2019), for composing deep neural networks.
☆169Updated 2 years ago
Alternatives and similar repositories for heinsen_routing
Users that are interested in heinsen_routing are comparing it to the libraries listed below
Sorting:
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …☆635Updated last year
- Official repository for the paper "A Modern Self-Referential Weight Matrix That Learns to Modify Itself" (ICML 2022 & NeurIPS 2021 Deep R…☆172Updated 2 weeks ago
- Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes☆239Updated 2 years ago
- A repository for log-time feedforward networks☆222Updated last year
- Implementation of Block Recurrent Transformer - Pytorch☆219Updated 10 months ago
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆94Updated 6 months ago
- Automatic gradient descent☆208Updated 2 years ago
- ☆252Updated last year
- Language Modeling with the H3 State Space Model☆519Updated last year
- Amos optimizer with JEstimator lib.☆82Updated last year
- LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence☆59Updated 3 years ago
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆345Updated 10 months ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆136Updated last year
- TART: A plug-and-play Transformer module for task-agnostic reasoning☆198Updated 2 years ago
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT☆214Updated 10 months ago
- A case study of efficient training of large language models using commodity hardware.☆69Updated 2 years ago
- a curated list of data for reasoning ai☆136Updated 10 months ago
- The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).…☆121Updated 2 years ago
- Open weights language model from Google DeepMind, based on Griffin.☆641Updated 3 weeks ago
- Visualize the intermediate output of Mistral 7B☆367Updated 5 months ago
- Standalone Product Key Memory module in Pytorch - for augmenting Transformer models☆80Updated 10 months ago
- Experiments for efforts to train a new and improved t5☆77Updated last year
- My explorations into editing the knowledge and memories of an attention network☆35Updated 2 years ago
- Visualizing the internal board state of a GPT trained on chess PGN strings, and performing interventions on its internal board state and …☆206Updated 7 months ago
- ☆143Updated 2 years ago
- [NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"☆72Updated this week
- Used for adaptive human in the loop evaluation of language and embedding models.☆309Updated 2 years ago
- 🤖 A PyTorch library of curated Transformer models and their composable components☆891Updated last year
- Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"☆59Updated 3 years ago
- The repository for the code of the UltraFastBERT paper☆516Updated last year