yuanzhoulvpi2017 / mamba4transformersLinks
☆13Updated last year
Alternatives and similar repositories for mamba4transformers
Users that are interested in mamba4transformers are comparing it to the libraries listed below
Sorting:
- ☆57Updated 5 months ago
- ☆26Updated last year
- ☆49Updated 6 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆56Updated 7 months ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆53Updated last year
- This the implementation of LeCo☆31Updated 11 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆40Updated last year
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆94Updated last month
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆86Updated 9 months ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆67Updated last year
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆62Updated last year
- ☆104Updated last year
- One-shot Entropy Minimization☆187Updated 6 months ago
- The official Github repository for paper "R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation" (EMNLP 2024 Fin…☆38Updated last year
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆28Updated 10 months ago
- ☆25Updated 8 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆138Updated last year
- ☆62Updated last year
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆88Updated 10 months ago
- ☆126Updated 7 months ago
- ☆65Updated last year
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated last year
- ☆152Updated last year
- [EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆68Updated 9 months ago
- ☆32Updated 7 months ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆32Updated 9 months ago
- ☆46Updated 7 months ago
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models☆44Updated 3 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆111Updated last month
- Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)☆33Updated 11 months ago