annosubmission / GRC-CacheLinks

☆16

Alternatives and similar repositories for GRC-Cache

Users that are interested in GRC-Cache are comparing it to the libraries listed below

Sorting:

OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆65Updated last year
WailordHe / DenseSSM
A repository for DenseSSMs
☆89Updated last year
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated last year
Doraemonzzz / hgru2-pytorch
☆23Updated last year
MambaMixer / M2
☆48Updated last year
chuanyang-Zheng / DAPE
The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"
☆39Updated last year
krafton-ai / mambaformer-icl
MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248
☆57Updated last year
Benjamin-Walker / selective-ssms-and-linear-cdes
Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)
☆15Updated 10 months ago
Eliyas0007 / Pytorch-Intention
Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention
☆12Updated 2 years ago
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆49Updated 2 years ago
siyuanseever / llama2Rnn.c
☆12Updated last year
Doraemonzzz / hgru-pytorch
☆29Updated last year
radarFudan / Curse-of-memory
Curse-of-memory phenomenon of RNNs in sequence modelling
☆19Updated 6 months ago
HelmholtzAI-FZJ / flex_gen
☆19Updated 10 months ago
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 10 months ago
yangjackie / Topics-on-diffusion-generative-models
☆27Updated last month
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Updated 2 months ago
piotrpiekos / MoSA
User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…
☆28Updated 6 months ago
IBM / selective-dense-state-space-model
Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …
☆15Updated 2 months ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆56Updated 2 years ago
VITA-Group / Data-Efficient-Scaling
[ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang
☆14Updated last year
HazyResearch / prefix-linear-attention
☆57Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
gmongaras / Cottention_Transformer
Code for the paper "Cottention: Linear Transformers With Cosine Attention"
☆20Updated last week
lsj2408 / URPE
[NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)
☆33Updated 2 years ago
fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆78Updated 2 years ago
fla-org / flash-bidirectional-linear-attention
Triton implement of bi-directional (non-causal) linear attention
☆56Updated 9 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
kyegomez / MAGVIT2
Open source community's implementation of the model from "LANGUAGE MODEL BEATS DIFFUSION — TOKENIZER IS KEY TO VISUAL GENERATION"
☆14Updated last year
PKU-ML / non_neg
Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning
☆46Updated last year