peytontolbert / GriffinLinks

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

☆13

Alternatives and similar repositories for Griffin

Users that are interested in Griffin are comparing it to the libraries listed below

Sorting:

kyegomez / MambaFormer
Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…
☆21Updated 2 weeks ago
kyegomez / MobileVLM
Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …
☆16Updated last year
lucidrains / vit-arc-slot
Explorations into improving ViTArc with Slot Attention
☆42Updated 8 months ago
kyegomez / Griffin
Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
☆55Updated 3 months ago
lucidrains / scaling-vin-pytorch
Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group
☆36Updated 9 months ago
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated 10 months ago
WisdomShell / RewardAnything
RewardAnything: Generalizable Principle-Following Reward Models
☆25Updated last month
IdoAmos / not-from-scratch
☆32Updated 8 months ago
kiddyboots216 / lottery-ticket-adaptation
Lottery Ticket Adaptation
☆39Updated 7 months ago
TariqAHassan / S4Torch
PyTorch implementation of Structured State Space for Sequence Modeling (S4), based on Annotated S4.
☆82Updated last year
Doraemonzzz / hgru-pytorch
☆27Updated last year
sidhu2690 / RBF-KAN
This code implements a Radial Basis Function (RBF) based Kolmogorov-Arnold Network (KAN) for function approximation.
☆29Updated last year
swiss-ai / MoE
some mixture of experts architecture implementations
☆14Updated last year
SamsungSAILMontreal / nino
Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]
☆19Updated last month
Itamarzimm / UnifiedImplicitAttnRepr
[ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
☆43Updated 4 months ago
zaydzuhri / flame
Fork of Flame repo for training of some new stuff in development
☆14Updated last week
badripatro / mamba360
State Space Models
☆68Updated last year
Doraemonzzz / hgru2-pytorch
☆23Updated 9 months ago
claCase / Attention-as-RNN
Non-official implementation of "Attention as an RNN" from https://arxiv.org/pdf/2405.13956, efficient associative parallel prefix scan an…
☆27Updated 11 months ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated last month
google-deepmind / spectral_ssm
☆32Updated last year
The-Swarm-Corporation / Mamba-R1
Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…
☆21Updated 2 weeks ago
GistNoesis / FusedFourierKAN
C++ and Cuda ops for fused FourierKAN
☆80Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆89Updated last year
lucidrains / hl-gauss-pytorch
The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch
☆65Updated last month
Rose-STL-Lab / AutoSTPP
Automatic Integration for Neural Spatio-Temporal Point Process models (AI-STPP) is a new paradigm for exact, efﬁcient, non-parametric inf…
☆24Updated 9 months ago
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆40Updated last year
lucidrains / x-transformers-rl
Implementation of a transformer for reinforcement learning using `x-transformers`
☆61Updated 3 weeks ago
CLAIRE-Labo / EvoTune
Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.
☆103Updated this week
luchris429 / DiscoPOP
Code for Discovering Preference Optimization Algorithms with and for Large Language Models
☆63Updated last year