hyperevolnet / Terminator
The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.
☆31Updated 2 weeks ago
Alternatives and similar repositories for Terminator:
Users that are interested in Terminator are comparing it to the libraries listed below
- Explorations into improving ViTArc with Slot Attention☆37Updated 3 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆92Updated 5 months ago
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆39Updated 2 months ago
- ☆53Updated last year
- σ-GPT: A New Approach to Autoregressive Models☆61Updated 5 months ago
- Implementation of Agent Attention in Pytorch☆89Updated 6 months ago
- A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.☆71Updated 2 weeks ago
- ☆70Updated 5 months ago
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆65Updated last week
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 9 months ago
- Implementation of DreamerV3 in Pytorch☆42Updated 2 months ago
- ☆45Updated 10 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆95Updated last month
- Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆112Updated 3 weeks ago
- HGRN2: Gated Linear RNNs with State Expansion☆52Updated 5 months ago
- Official implementation of MAIA, A Multimodal Automated Interpretability Agent☆72Updated 5 months ago
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆20Updated this week
- Implementation of Infini-Transformer in Pytorch☆109Updated 3 weeks ago
- [NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models".☆36Updated 2 months ago
- ☆78Updated 9 months ago
- ☆30Updated 8 months ago
- A State-Space Model with Rational Transfer Function Representation.☆77Updated 8 months ago
- ☆31Updated 9 months ago
- Minimal Implementation of Visual Autoregressive Modelling (VAR)☆23Updated 3 weeks ago
- Evaluating the Mamba architecture on the Othello game☆44Updated 9 months ago
- Mixture of A Million Experts☆33Updated 5 months ago
- ☆37Updated 9 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆115Updated 5 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆113Updated 3 months ago