codefuse-ai / rodimusLinks
☆14Updated last month
Alternatives and similar repositories for rodimus
Users that are interested in rodimus are comparing it to the libraries listed below
Sorting:
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆32Updated 9 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated last month
- GoldFinch and other hybrid transformer components☆10Updated 3 weeks ago
- ☆19Updated this week
- A repository for research on medium sized language models.☆76Updated last year
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆37Updated last month
- ☆13Updated 5 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆28Updated 8 months ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆35Updated 2 months ago
- The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆27Updated last week
- Experiments on the impact of depth in transformers and SSMs.☆30Updated 7 months ago
- GoldFinch and other hybrid transformer components☆45Updated 10 months ago
- Here we will test various linear attention designs.☆58Updated last year
- EvaByte: Efficient Byte-level Language Models at Scale☆101Updated last month
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆32Updated 2 months ago
- ☆9Updated last month
- Reinforcing General Reasoning without Verifiers☆51Updated last week
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆32Updated 2 months ago
- A large-scale RWKV v6, v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to de…☆35Updated last week
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- ☆34Updated 11 months ago
- Agent Skill Induction: "Inducing Programmatic Skills for Agentic Tasks"☆20Updated last month
- ☆49Updated 7 months ago
- Lottery Ticket Adaptation☆39Updated 6 months ago
- ☆15Updated 2 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated 2 months ago
- RWKV-7: Surpassing GPT☆88Updated 6 months ago
- Aioli: A unified optimization framework for language model data mixing☆27Updated 4 months ago
- ☆48Updated 10 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year