yuanzhoulvpi2017 / mamba4transformers
☆12Updated last year
Alternatives and similar repositories for mamba4transformers:
Users that are interested in mamba4transformers are comparing it to the libraries listed below
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆45Updated 5 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆70Updated this week
- ☆22Updated 9 months ago
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆103Updated 3 weeks ago
- exploring whether LLMs perform case-based or rule-based reasoning☆28Updated last year
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆65Updated 2 months ago
- This the implementation of LeCo☆32Updated 3 months ago
- ☆36Updated 2 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆37Updated 6 months ago
- ☆55Updated 6 months ago
- ☆144Updated 7 months ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆46Updated 8 months ago
- A repository for DenseSSMs☆87Updated last year
- this is an implementation for the paper Improve Mathematical Reasoning in Language Models by Automated Process Supervision from google de…☆28Updated 3 weeks ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆91Updated last week
- To assess the longtext capabilities more comprehensively, we propose Needle-in-a-Haystack PLUS, which shifts the focus from simple fact r…☆11Updated last year
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆20Updated last year
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆68Updated last month
- ☆40Updated last month
- ☆94Updated last month
- Reproduction of the complete process of DeepSeek-R1 on small-scale models, including Pre-training, SFT, and RL.☆22Updated last month
- Parameter-Efficient Fine-Tuning for Foundation Models☆57Updated 3 weeks ago
- A Self-Training Framework for Vision-Language Reasoning☆76Updated 3 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆20Updated 2 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆74Updated last month
- ☆132Updated 9 months ago
- ☆40Updated 3 months ago
- The code of arxiv paper: "CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis"☆24Updated 3 months ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆55Updated 2 weeks ago
- ☆100Updated 9 months ago