zhiyuan1i / TorchRWKVLinks
RWKV6 in native pytorch and triton:)
☆11Updated 11 months ago
Alternatives and similar repositories for TorchRWKV
Users that are interested in TorchRWKV are comparing it to the libraries listed below
Sorting:
- Here we will test various linear attention designs.☆60Updated last year
- Implementation of a Light Recurrent Unit in Pytorch☆48Updated 9 months ago
- A large-scale RWKV v6, v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to de…☆38Updated last week
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆37Updated 4 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated 2 months ago
- Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton☆38Updated this week
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- Modified Score-Entropy-Discrete-Diffusion to do a character level ml model and integrate with Oxen☆16Updated last year
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆42Updated this week
- ☆32Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 8 months ago
- RADLADS training code☆24Updated 2 months ago
- Official Code Repository for the paper "Key-value memory in the brain"☆27Updated 4 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Updated 11 months ago
- ☆22Updated 3 weeks ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆38Updated last month
- ☆23Updated 9 months ago
- State tuning tunes the state☆34Updated 5 months ago
- RWKV model implementation☆38Updated 2 years ago
- RWKV-7 mini☆11Updated 3 months ago
- GoldFinch and other hybrid transformer components☆46Updated 11 months ago
- Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆34Updated this week
- Implementation of GateLoop Transformer in Pytorch and Jax☆89Updated last year
- Griffin MQA + Hawk Linear RNN Hybrid☆87Updated last year
- Triton implement of bi-directional (non-causal) linear attention☆51Updated 5 months ago
- RWKV-7: Surpassing GPT☆92Updated 7 months ago
- ☆27Updated last year
- Fast modular code to create and train cutting edge LLMs☆67Updated last year
- Here we collect trick questions and failed tasks for open source LLMs to improve them.☆32Updated 2 years ago
- ☆37Updated 2 months ago