BBuf / flash-rwkv
☆30Updated 8 months ago
Alternatives and similar repositories for flash-rwkv:
Users that are interested in flash-rwkv are comparing it to the libraries listed below
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆22Updated 8 months ago
- continous batching and parallel acceleration for RWKV6☆24Updated 7 months ago
- Here we will test various linear attention designs.☆58Updated 9 months ago
- ☆22Updated last year
- ☆65Updated last week
- Odysseus: Playground of LLM Sequence Parallelism☆64Updated 7 months ago
- Awesome Triton Resources☆19Updated 2 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆38Updated 11 months ago
- Linear Attention Sequence Parallelism (LASP)☆77Updated 8 months ago
- Triton implement of bi-directional (non-causal) linear attention☆42Updated last week
- 🔥 A minimal training framework for scaling FLA models☆55Updated this week
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆25Updated 9 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆44Updated last year
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Updated last year
- ☆16Updated last month
- SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference☆40Updated 2 months ago
- Transformers components but in Triton☆31Updated 2 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆28Updated 8 months ago
- DPO, but faster 🚀☆33Updated 2 months ago
- ☆99Updated 11 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated 8 months ago
- ☆45Updated last year
- Stick-breaking attention☆42Updated last month
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated 8 months ago
- ☆18Updated this week
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆18Updated 3 months ago
- ☆47Updated last year
- ☆49Updated 7 months ago