00ffcc / chunkRWKV6
continous batching and parallel acceleration for RWKV6
☆24Updated 8 months ago
Alternatives and similar repositories for chunkRWKV6:
Users that are interested in chunkRWKV6 are comparing it to the libraries listed below
- ☆30Updated 9 months ago
- 🔥 A minimal training framework for scaling FLA models☆75Updated this week
- ☆22Updated last year
- Here we will test various linear attention designs.☆59Updated 10 months ago
- ☆100Updated last year
- A large-scale RWKV v6, v7(World, ARWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy on docke…☆31Updated 2 weeks ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆64Updated 10 months ago
- ☆51Updated 8 months ago
- ☆47Updated last year
- Transformers components but in Triton☆32Updated 3 months ago
- Linear Attention Sequence Parallelism (LASP)☆79Updated 9 months ago
- Stick-breaking attention☆48Updated this week
- SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference☆44Updated 3 months ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆11Updated 4 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆70Updated 4 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.☆99Updated 6 months ago
- ☆115Updated 3 weeks ago
- ☆42Updated last year
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated 9 months ago
- A 20M RWKV v6 can do nonogram☆12Updated 4 months ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆39Updated 4 months ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆35Updated 9 months ago