yuanzhoulvpi2017 / mamba4transformersLinks
☆13Updated last year
Alternatives and similar repositories for mamba4transformers
Users that are interested in mamba4transformers are comparing it to the libraries listed below
Sorting:
- ☆53Updated 4 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆55Updated 6 months ago
- ☆49Updated 5 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆111Updated last week
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆66Updated last year
- One-shot Entropy Minimization☆187Updated 6 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆93Updated last month
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆39Updated last year
- A comprehensive and efficient long-context model evaluation framework☆27Updated 3 weeks ago
- ☆134Updated 9 months ago
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆40Updated last year
- ☆26Updated last year
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆88Updated 9 months ago
- ☆65Updated last year
- This the implementation of LeCo☆31Updated 10 months ago
- ☆167Updated last year
- A repository for DenseSSMs☆89Updated last year
- ☆21Updated last week
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆91Updated last month
- ☆38Updated 3 months ago
- ☆62Updated 5 months ago
- ☆30Updated 6 months ago
- RewardAnything: Generalizable Principle-Following Reward Models☆45Updated 6 months ago
- ☆85Updated 8 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆146Updated 8 months ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆31Updated 8 months ago
- Recent Advances on MLLM's Reasoning Ability☆26Updated 8 months ago
- ☆151Updated last year
- Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training☆45Updated 3 months ago
- Official completion of “Training on the Benchmark Is Not All You Need”.☆38Updated 11 months ago