hyperevolnet / TerminatorLinks
The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.
☆36Updated 2 months ago
Alternatives and similar repositories for Terminator
Users that are interested in Terminator are comparing it to the libraries listed below
Sorting:
- Minimal Implementation of Visual Autoregressive Modelling (VAR)☆33Updated 2 months ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆104Updated 3 weeks ago
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆222Updated last year
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆42Updated 6 months ago
- Mixture of A Million Experts☆46Updated 10 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 9 months ago
- My take on Flow Matching☆57Updated 4 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆126Updated 9 months ago
- Explorations into improving ViTArc with Slot Attention☆41Updated 7 months ago
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21Updated last week
- Explorations into the recently proposed Taylor Series Linear Attention☆99Updated 9 months ago
- σ-GPT: A New Approach to Autoregressive Models☆64Updated 9 months ago
- ☆41Updated 4 months ago
- ☆79Updated 9 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated last year
- ☆60Updated 4 months ago
- The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch☆63Updated last month
- [NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models".☆38Updated 7 months ago
- Official Code Repository for the paper "Continuous Diffusion Model for Language Modeling".☆29Updated 2 months ago
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆21Updated 2 months ago
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆42Updated 3 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 9 months ago
- Focused on fast experimentation and simplicity☆73Updated 5 months ago
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆51Updated 2 months ago
- ☆22Updated 2 weeks ago
- ☆80Updated last year
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆108Updated 8 months ago
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆50Updated this week
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆25Updated last month
- Implementation of Agent Attention in Pytorch☆90Updated 10 months ago