yuanzhoulvpi2017 / mamba4transformers
☆11Updated last year
Alternatives and similar repositories for mamba4transformers:
Users that are interested in mamba4transformers are comparing it to the libraries listed below
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆49Updated last month
- This the implementation of LeCo☆30Updated last month
- ☆22Updated 7 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆112Updated 3 months ago
- [ICLR 2025] SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights☆55Updated last week
- ☆26Updated 2 months ago
- exploring whether LLMs perform case-based or rule-based reasoning☆28Updated 11 months ago
- Official completion of “Training on the Benchmark Is Not All You Need”.☆29Updated last month
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆35Updated 4 months ago
- ☆48Updated 11 months ago
- ☆45Updated 4 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆31Updated last year
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆36Updated 6 months ago
- A generalized framework for subspace tuning methods in parameter efficient fine-tuning.☆128Updated 2 weeks ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆45Updated 7 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆28Updated 8 months ago
- ☆58Updated 5 months ago
- ☆69Updated last week
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆81Updated last week
- Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning☆24Updated 2 weeks ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆77Updated 7 months ago
- ☆57Updated 2 months ago
- OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure☆21Updated 6 months ago
- ☆93Updated 7 months ago
- ☆45Updated 8 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆40Updated 3 months ago