bcml-labs / rosa-plusLinks
ROSA+: RWKV's ROSA implementation with fallback statistical predictor
☆31Updated 3 months ago
Alternatives and similar repositories for rosa-plus
Users that are interested in rosa-plus are comparing it to the libraries listed below
Sorting:
- ROSA-Tuning☆65Updated last week
- A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…☆47Updated 3 months ago
- ☆71Updated 7 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated 9 months ago
- Work in progress.☆79Updated 2 months ago
- RWKV-7: Surpassing GPT☆104Updated last year
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Updated 11 months ago
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆54Updated 3 weeks ago
- ☆26Updated last year
- EvaByte: Efficient Byte-level Language Models at Scale☆115Updated 9 months ago
- Universal Reasoning Model☆122Updated 3 weeks ago
- ☆41Updated 9 months ago
- ☆13Updated last year
- [ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications☆52Updated 3 months ago
- A repository for research on medium sized language models.☆77Updated last year
- ☆67Updated 10 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆87Updated 4 months ago
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Updated 4 months ago
- ☆163Updated 7 months ago
- ☆119Updated last month
- PyTorch implementation of models from the Zamba2 series.☆186Updated last year
- Official repo of paper LM2☆46Updated 11 months ago
- The evaluation framework for training-free sparse attention in LLMs☆117Updated 2 weeks ago
- ☆82Updated last year
- H-Net Dynamic Hierarchical Architecture☆81Updated 5 months ago
- QuIP quantization☆61Updated last year
- ☆29Updated 3 months ago
- Official implementation for Training LLMs with MXFP4☆118Updated 9 months ago
- Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding☆203Updated 3 weeks ago