00ffcc / chunkRWKV6Links

continous batching and parallel acceleration for RWKV6

☆22

Alternatives and similar repositories for chunkRWKV6

Users that are interested in chunkRWKV6 are comparing it to the libraries listed below

Sorting:

fla-org / fla-zoo
Flash-Linear-Attention models beyond language
☆20Updated 3 months ago
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
☆22Updated last year
BBuf / flash-rwkv
☆32Updated last year
johanwind / wind_rwkv
☆27Updated 4 months ago
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆66Updated last year
leloykun / flash-attention-minimal
Flash Attention in 300-500 lines of CUDA/C++
☆36Updated 3 months ago
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆74Updated 9 months ago
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Updated 3 months ago
HazyResearch / prefix-linear-attention
☆57Updated last year
Dao-AILab / grouped-latent-attention
☆132Updated 6 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
recursal / RADLADS-paper
RADLADS training code
☆34Updated 7 months ago
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆38Updated 7 months ago
berlino / gated_linear_attention
☆106Updated last year
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆87Updated last year
sustcsonglin / mamba-triton
☆50Updated last year
dame-cell / Triformer
Transformers components but in Triton
☆34Updated 6 months ago
OpenSparseLLMs / Linear-MoE
☆120Updated 6 months ago
OpenSparseLLMs / MoM
☆110Updated 2 months ago
howard-hou / RWKV-X
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…
☆51Updated 4 months ago
shawntan / stickbreaking-attention
Stick-breaking attention
☆61Updated 5 months ago
OpenSparseLLMs / Linearization
☆61Updated 5 months ago
sjelassi / transformers_ssm_copy
☆35Updated last year
mdy666 / Qwen-Native-Sparse-Attention
qwen-nsa
☆84Updated last month
IBM / selective-dense-state-space-model
Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …
☆15Updated 2 months ago
AwesomeSeq / Comba-triton
☆49Updated 5 months ago
epfml / dynamic-sparse-flash-attention
☆150Updated 2 years ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆212Updated 5 months ago
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
fla-org / flash-bidirectional-linear-attention
Triton implement of bi-directional (non-causal) linear attention
☆56Updated 10 months ago