00ffcc / chunkRWKV6Links
continous batching and parallel acceleration for RWKV6
☆24Updated 11 months ago
Alternatives and similar repositories for chunkRWKV6
Users that are interested in chunkRWKV6 are comparing it to the libraries listed below
Sorting:
- Flash-Linear-Attention models beyond language☆16Updated this week
- ☆21Updated 3 months ago
- ☆31Updated last year
- RADLADS training code☆24Updated last month
- ☆18Updated last week
- ☆22Updated last year
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆39Updated last month
- ☆114Updated 3 weeks ago
- A large-scale RWKV v6, v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to de…☆38Updated 3 weeks ago
- ☆104Updated 2 weeks ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆65Updated last year
- ☆85Updated 2 months ago
- 🔥 A minimal training framework for scaling FLA models☆178Updated 2 weeks ago
- Transformers components but in Triton☆34Updated last month
- Here we will test various linear attention designs.☆59Updated last year
- Odysseus: Playground of LLM Sequence Parallelism☆70Updated last year
- PyTorch bindings for CUTLASS grouped GEMM.☆100Updated 3 weeks ago
- Awesome Triton Resources☆31Updated last month
- SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference☆47Updated 7 months ago
- Efficient triton implementation of Native Sparse Attention.☆168Updated last month
- ☆22Updated this week
- ☆51Updated 3 months ago
- ☆55Updated 11 months ago
- qwen-nsa☆67Updated 2 months ago
- ☆48Updated last year
- ☆51Updated 7 months ago
- Linear Attention Sequence Parallelism (LASP)☆84Updated last year
- ☆105Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆131Updated last week
- Triton implement of bi-directional (non-causal) linear attention☆50Updated 4 months ago