hyperevolnet / Terminator
The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.
☆36Updated this week
Alternatives and similar repositories for Terminator:
Users that are interested in Terminator are comparing it to the libraries listed below
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆42Updated 4 months ago
- Minimal Implementation of Visual Autoregressive Modelling (VAR)☆29Updated last week
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆216Updated 10 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆96Updated 7 months ago
- Official code for "TOAST: Transfer Learning via Attention Steering"☆190Updated last year
- Explorations into improving ViTArc with Slot Attention☆39Updated 5 months ago
- Implementation of Agent Attention in Pytorch☆90Updated 8 months ago
- Official code for the ICML 2024 paper "The Entropy Enigma: Success and Failure of Entropy Minimization"☆49Updated 10 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆118Updated 5 months ago
- Implementation of Infini-Transformer in Pytorch☆110Updated 3 months ago
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆50Updated 2 months ago
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆20Updated 2 weeks ago
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆115Updated 2 months ago
- The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch☆57Updated last month
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆47Updated last week
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated 11 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 7 months ago
- [NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models".☆38Updated 4 months ago
- Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch☆88Updated last year
- ☆30Updated 2 months ago
- σ-GPT: A New Approach to Autoregressive Models☆62Updated 7 months ago
- Implementation of the Llama architecture with RLHF + Q-learning☆163Updated 2 months ago
- More dimensions = More fun☆21Updated 8 months ago
- Mixture of A Million Experts☆42Updated 8 months ago
- My take on Flow Matching☆44Updated 2 months ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆55Updated 10 months ago
- A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.☆115Updated last month
- Focused on fast experimentation and simplicity☆70Updated 3 months ago
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆33Updated 4 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆126Updated 2 months ago