yuanzhoulvpi2017 / mamba4transformersLinks
☆13Updated last year
Alternatives and similar repositories for mamba4transformers
Users that are interested in mamba4transformers are comparing it to the libraries listed below
Sorting:
- ☆51Updated 3 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆91Updated this week
- ☆23Updated 7 months ago
- ☆25Updated last year
- ☆148Updated last year
- exploring whether LLMs perform case-based or rule-based reasoning☆30Updated last year
- ☆65Updated 11 months ago
- ☆104Updated 11 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆87Updated 9 months ago
- One-shot Entropy Minimization☆187Updated 5 months ago
- ☆46Updated 5 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆109Updated last week
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆66Updated last year
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆83Updated 7 months ago
- Code for ACL 2025 Main paper "Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning…☆42Updated 3 months ago
- [EMNLP'2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆66Updated 7 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆138Updated last year
- ☆166Updated last year
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models☆41Updated last month
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆39Updated last year
- A repository for DenseSSMs☆89Updated last year
- This the implementation of LeCo☆31Updated 9 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆112Updated 9 months ago
- rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking☆39Updated 10 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆231Updated last month
- MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.☆240Updated 3 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆147Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆148Updated 9 months ago
- ☆131Updated 8 months ago
- ☆19Updated 4 months ago