00ffcc / chunkRWKV6Links
continous batching and parallel acceleration for RWKV6
☆24Updated 11 months ago
Alternatives and similar repositories for chunkRWKV6
Users that are interested in chunkRWKV6 are comparing it to the libraries listed below
Sorting:
- ☆31Updated last year
- ☆21Updated 2 months ago
- ☆54Updated 10 months ago
- ☆93Updated 2 weeks ago
- Flash-Linear-Attention models beyond language☆14Updated this week
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆64Updated last year
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆37Updated last month
- ☆83Updated last month
- Awesome Triton Resources☆28Updated last month
- 🔥 A minimal training framework for scaling FLA models☆146Updated 3 weeks ago
- ☆22Updated last year
- Here we will test various linear attention designs.☆58Updated last year
- ☆103Updated last year
- ☆93Updated last week
- ☆46Updated last year
- Odysseus: Playground of LLM Sequence Parallelism☆69Updated 11 months ago
- The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free☆40Updated 3 weeks ago
- ☆47Updated 2 months ago
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated last year
- Linear Attention Sequence Parallelism (LASP)☆83Updated last year
- Transformers components but in Triton☆33Updated 3 weeks ago
- qwen-nsa☆66Updated last month
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆121Updated last week
- RADLADS training code☆22Updated 3 weeks ago
- SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference☆46Updated 6 months ago
- Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton☆29Updated last week
- Triton implementation of FlashAttention2 that adds Custom Masks.☆117Updated 9 months ago
- A large-scale RWKV v6, v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to de…☆35Updated last week
- ☆17Updated last month
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆104Updated 3 weeks ago