hyperevolnet / Terminator
The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.
☆31Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for Terminator
- Explorations into the recently proposed Taylor Series Linear Attention☆90Updated 3 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆85Updated 2 months ago
- Explorations into improving ViTArc with Slot Attention☆37Updated last month
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 7 months ago
- σ-GPT: A New Approach to Autoregressive Models☆59Updated 3 months ago
- Implementation of Agent Attention in Pytorch☆86Updated 4 months ago
- ☆53Updated 10 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆78Updated 2 months ago
- ☆76Updated 7 months ago
- Implementation of the proposed Spline-Based Transformer from Disney Research☆76Updated last week
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆33Updated last week
- Implementation of a multimodal diffusion transformer in Pytorch☆97Updated 4 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆95Updated 2 weeks ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆112Updated 2 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆103Updated 3 months ago
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21Updated last week
- ☆27Updated 6 months ago
- Mixture of A Million Experts☆32Updated 3 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆108Updated last month
- A State-Space Model with Rational Transfer Function Representation.☆70Updated 6 months ago
- ☆46Updated last month
- ☆41Updated 7 months ago
- Griffin MQA + Hawk Linear RNN Hybrid☆85Updated 6 months ago
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆43Updated last month
- More dimensions = More fun☆21Updated 3 months ago
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆200Updated 5 months ago
- Collection of autoregressive model implementation☆67Updated this week
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- ☆39Updated 10 months ago
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆53Updated 2 months ago