RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
โ60Mar 17, 2025Updated last year
Alternatives and similar repositories for RWKV-LM
Users that are interested in RWKV-LM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- UQ: Assessing Language Models on Unsolved Questionsโ30Aug 26, 2025Updated 6 months ago
- ๐ฎManipulates mobile phones just like how you would. Official code for "MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficienโฆโ27Oct 10, 2025Updated 5 months ago
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Showsโ19Nov 4, 2025Updated 4 months ago
- Codes for Evolving Plastic ANNsโ14Dec 18, 2022Updated 3 years ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"โ19Mar 10, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI โข AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- An Attention Superoptimizerโ22Jan 20, 2025Updated last year
- Official Repository of Native Parallel Reasonerโ103Feb 5, 2026Updated last month
- GoldFinch and other hybrid transformer componentsโ45Jul 20, 2024Updated last year
- โ18Apr 18, 2025Updated 11 months ago
- Resa: Transparent Reasoning Models via SAEsโ48Sep 23, 2025Updated 6 months ago
- Martingale posterior neural networks for fast sequential decision making @ Neurips 2025โ23Nov 13, 2025Updated 4 months ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"โ88Sep 12, 2025Updated 6 months ago
- [ACM MM2025] The official repository for the RealSyn datasetโ40Dec 14, 2025Updated 3 months ago
- Jax Codebase for Evolutionary Strategies at the Hyperscaleโ231Feb 27, 2026Updated 3 weeks ago
- DigitalOcean Gradient AI Platform โข AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official Implementation of "Maximum Likelihood Reinforcement Learning (MaxRL)"โ153Mar 15, 2026Updated last week
- The official implementation of the paper SAEdit: Token-level control for continuous image editing via Sparse AutoEncoderโ19Oct 19, 2025Updated 5 months ago
- JSON RPC v2.0 Sans I/Oโ11Updated this week
- KV Cache Steering for Inducing Reasoning in Small Language Modelsโ46Jul 24, 2025Updated 8 months ago
- Mamba support for transformer lensโ19Sep 17, 2024Updated last year
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044โ35Oct 3, 2024Updated last year
- โ23Dec 28, 2024Updated last year
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusionโ14Mar 17, 2025Updated last year
- โ14Apr 14, 2025Updated 11 months ago
- Proton VPN Special Offer - Get 70% off โข AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Official Implementation of UTrice: Unifying Primitives in Differentiable Ray Tracing and Rasterization via Triangles for Particle-Based 3โฆโ30Jan 13, 2026Updated 2 months ago
- This is a simple torch implementation of the high performance Multi-Query Attentionโ16Aug 23, 2023Updated 2 years ago
- โ15Mar 20, 2025Updated last year
- This project is the official implementation of 'DreamOmni3: Scribble-based Editing and Generation''โ39Dec 30, 2025Updated 2 months ago
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's lโฆโ56Updated this week
- Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpeningโ70May 18, 2025Updated 10 months ago
- Benchmarking general decision-making with open & random worldsโ20Updated this week
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationโ172Nov 26, 2025Updated 3 months ago
- โ23Mar 7, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI โข AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A Comprehensive Dataset for Advanced Image Generation and Editing}โ31Oct 2, 2025Updated 5 months ago
- โ35Jan 25, 2026Updated 2 months ago
- [EMNLP 2025] Code for paper "Table-R1: Inference-Time Scaling for Table Reasoning"โ29Jun 3, 2025Updated 9 months ago
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generationโ34May 28, 2025Updated 9 months ago
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08โฆโ40Jun 4, 2025Updated 9 months ago
- The opinionated high performance professional-grade AI package for Goโ25Updated this week
- EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasetsโ10Dec 12, 2023Updated 2 years ago