peytontolbert / Griffin
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
☆12Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for Griffin
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆49Updated this week
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆20Updated this week
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆36Updated last month
- PyTorch implementation of Structured State Space for Sequence Modeling (S4), based on Annotated S4.☆68Updated 8 months ago
- ☆25Updated 3 weeks ago
- ☆29Updated last month
- HGRN2: Gated Linear RNNs with State Expansion☆49Updated 2 months ago
- Implemenation of the HIERarchical imagionation On Structured State Space Sequence Models (HIEROS) paper☆12Updated 3 months ago
- ☆25Updated 4 months ago
- C++ and Cuda ops for fused FourierKAN☆73Updated 6 months ago
- ☆52Updated this week
- Pytorch implementation of Simplified Structured State-Spaces for Sequence Modeling (S5)☆64Updated 6 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆24Updated 6 months ago
- A Triton Kernel for incorporating Bi-Directionality in Mamba2☆47Updated 2 months ago
- Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise☆31Updated 2 months ago
- Official Implementation Of The Paper: `DeciMamba: Exploring the Length Extrapolation Potential of Mamba'☆20Updated 3 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆102Updated 3 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆77Updated last month
- ☆41Updated 7 months ago
- Griffin MQA + Hawk Linear RNN Hybrid☆85Updated 6 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆23Updated last week
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆31Updated last month
- Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"☆59Updated last week
- Source code for the paper "Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning"☆14Updated last month
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆40Updated 2 months ago
- Kolmogorov-Arnold Networks (KAN) using Jacobi polynomials instead of B-splines.☆32Updated 6 months ago
- ☆27Updated 7 months ago
- This code implements a Radial Basis Function (RBF) based Kolmogorov-Arnold Network (KAN) for function approximation.☆25Updated 4 months ago
- ☆76Updated 5 months ago
- ☆21Updated last month