yuanzhoulvpi2017 / mamba4transformersLinks
☆12Updated last year
Alternatives and similar repositories for mamba4transformers
Users that are interested in mamba4transformers are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆50Updated last month
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆103Updated last week
- This the implementation of LeCo☆31Updated 6 months ago
- ☆23Updated last year
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆85Updated 4 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆74Updated 4 months ago
- ☆63Updated last year
- Official completion of “Training on the Benchmark Is Not All You Need”.☆35Updated 6 months ago
- Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'☆27Updated 2 months ago
- ☆18Updated 3 weeks ago
- RL Scaling and Test-Time Scaling (ICML'25)☆109Updated 6 months ago
- ☆75Updated 2 weeks ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆38Updated 9 months ago
- ☆112Updated last month
- ☆45Updated last month
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆81Updated last month
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆124Updated 8 months ago
- Due to the huge vocaburary size (151,936) of Qwen models, the Embedding and LM Head weights are excessively heavy. Therefore, this projec…☆23Updated 11 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆80Updated last month
- ☆116Updated 4 months ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆58Updated 11 months ago
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆35Updated 11 months ago
- ☆18Updated last month
- ☆52Updated 2 weeks ago
- exploring whether LLMs perform case-based or rule-based reasoning☆29Updated last year
- [EMNLP 2024 Findings] Unlocking Continual Learning Abilities in Language Models☆25Updated 9 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.☆145Updated this week
- ☆91Updated 3 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆63Updated last week
- ☆33Updated last month