johanwind / wind_rwkvLinks

☆26

Alternatives and similar repositories for wind_rwkv

Users that are interested in wind_rwkv are comparing it to the libraries listed below

Sorting:

BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
sustcsonglin / mamba-triton
☆48Updated last year
00ffcc / chunkRWKV6
continous batching and parallel acceleration for RWKV6
☆23Updated last year
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Updated last month
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆16Updated last year
Doraemonzzz / nanoTransNormer
☆11Updated 2 years ago
HazyResearch / prefix-linear-attention
☆56Updated last year
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆36Updated 5 months ago
howard-hou / RWKV-X
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…
☆50Updated 3 months ago
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆65Updated last year
sustcsonglin / gated_linear_attention_layer
☆31Updated last year
recursal / RADLADS-paper
RADLADS training code
☆29Updated 5 months ago
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆36Updated 11 months ago
proger / nanokitchen
Parallel Associative Scan for Language Models
☆17Updated last year
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Updated last year
BBuf / flash-rwkv
☆32Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
kazuki-irie / kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain"
☆29Updated 8 months ago
acosharma / elita-transformer
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Updated last year
OpenMOSE / RWKV-Infer
A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…
☆45Updated last week
Doraemonzzz / hgru-pytorch
☆27Updated last year
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆39Updated last year
LeC-Z / RWKV-nonogram
A 20M RWKV v6 can do nonogram
☆14Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
shawntan / stickbreaking-attention
Stick-breaking attention
☆61Updated 3 months ago
automl / unlocking_state_tracking
Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…
☆17Updated 7 months ago
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆54Updated last year