codefuse-ai / rodimusLinks
☆171Updated 4 months ago
Alternatives and similar repositories for rodimus
Users that are interested in rodimus are comparing it to the libraries listed below
Sorting:
- A repository for research on medium sized language models.☆78Updated last year
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆46Updated last month
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆40Updated 3 weeks ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Updated last year
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated 4 months ago
- RWKV-7: Surpassing GPT☆94Updated 9 months ago
- ☆26Updated 2 months ago
- A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…☆42Updated last week
- GoldFinch and other hybrid transformer components☆45Updated last year
- RADLADS training code☆27Updated 3 months ago
- ☆38Updated 4 months ago
- ☆69Updated 10 months ago
- ☆85Updated last year
- Official Code Repository for the paper "Key-value memory in the brain"☆28Updated 6 months ago
- [ACL 2025] An inference-time decoding strategy with adaptive foresight sampling☆104Updated 3 months ago
- ☆66Updated 2 months ago
- Here we will test various linear attention designs.☆62Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆56Updated last year
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆59Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Updated last year
- EvaByte: Efficient Byte-level Language Models at Scale☆109Updated 4 months ago
- ☆38Updated last year
- [ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluation☆51Updated this week
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆47Updated 4 months ago
- ☆53Updated 10 months ago
- Training hybrid models for dummies.☆25Updated 7 months ago
- ☆61Updated 5 months ago
- Official repo of dataset-decomposition paper [NeurIPS 2024]☆19Updated 7 months ago
- ☆89Updated 9 months ago
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆23Updated 4 months ago