peytontolbert / GriffinLinks
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
☆13Updated last year
Alternatives and similar repositories for Griffin
Users that are interested in Griffin are comparing it to the libraries listed below
Sorting:
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21Updated 2 weeks ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆16Updated last year
- Explorations into improving ViTArc with Slot Attention☆42Updated 8 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆55Updated 3 months ago
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆36Updated 9 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆55Updated 10 months ago
- RewardAnything: Generalizable Principle-Following Reward Models☆25Updated last month
- ☆32Updated 8 months ago
- Lottery Ticket Adaptation☆39Updated 7 months ago
- PyTorch implementation of Structured State Space for Sequence Modeling (S4), based on Annotated S4.☆82Updated last year
- ☆27Updated last year
- This code implements a Radial Basis Function (RBF) based Kolmogorov-Arnold Network (KAN) for function approximation.☆29Updated last year
- some mixture of experts architecture implementations☆14Updated last year
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆19Updated last month
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆43Updated 4 months ago
- Fork of Flame repo for training of some new stuff in development☆14Updated last week
- State Space Models☆68Updated last year
- ☆23Updated 9 months ago
- Non-official implementation of "Attention as an RNN" from https://arxiv.org/pdf/2405.13956, efficient associative parallel prefix scan an…☆27Updated 11 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆38Updated last month
- ☆32Updated last year
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆21Updated 2 weeks ago
- C++ and Cuda ops for fused FourierKAN☆80Updated last year
- Implementation of GateLoop Transformer in Pytorch and Jax☆89Updated last year
- The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch☆65Updated last month
- Automatic Integration for Neural Spatio-Temporal Point Process models (AI-STPP) is a new paradigm for exact, efficient, non-parametric inf…☆24Updated 9 months ago
- Official code for the paper "Attention as a Hypernetwork"☆40Updated last year
- Implementation of a transformer for reinforcement learning using `x-transformers`☆61Updated 3 weeks ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆103Updated this week
- Code for Discovering Preference Optimization Algorithms with and for Large Language Models☆63Updated last year