peytontolbert / Griffin
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
☆13Updated 11 months ago
Alternatives and similar repositories for Griffin:
Users that are interested in Griffin are comparing it to the libraries listed below
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆20Updated this week
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆51Updated 2 weeks ago
- PyTorch implementation of Structured State Space for Sequence Modeling (S4), based on Annotated S4.☆76Updated 11 months ago
- ☆28Updated 3 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆52Updated 5 months ago
- Pytorch implementation of Simplified Structured State-Spaces for Sequence Modeling (S5)☆73Updated 9 months ago
- ☆28Updated 7 months ago
- C++ and Cuda ops for fused FourierKAN☆75Updated 9 months ago
- Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise☆32Updated 5 months ago
- Code repository for Trajectory Flow Matching☆51Updated 3 months ago
- ☆26Updated 7 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated this week
- Official Implementation Of The Paper: `DeciMamba: Exploring the Length Extrapolation Potential of Mamba'☆23Updated 6 months ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆22Updated last year
- Transformer model based on Kolmogorov–Arnold Network(KAN), which is an alternative of Multi-Layer Perceptron(MLP)☆27Updated 2 months ago
- Implemenation of the HIERarchical imagionation On Structured State Space Sequence Models (HIEROS) paper☆15Updated 7 months ago
- Parallelizing non-linear sequential models over the sequence length☆50Updated 3 weeks ago
- Source code for the paper "Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning"☆14Updated 2 weeks ago
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆36Updated 4 months ago
- ☆24Updated 4 months ago
- Kolmogorov-Arnold Networks (KAN) using Jacobi polynomials instead of B-splines.☆36Updated 9 months ago
- This code implements a Radial Basis Function (RBF) based Kolmogorov-Arnold Network (KAN) for function approximation.☆26Updated 8 months ago
- ☆45Updated 10 months ago
- Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…☆78Updated 2 weeks ago
- [CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model☆19Updated 4 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆96Updated 5 months ago
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆93Updated 2 weeks ago
- Hierarchical State Space Models☆43Updated 10 months ago
- A repository for DenseSSMs☆86Updated 10 months ago
- State Space Models☆64Updated 9 months ago