yuanzhoulvpi2017 / mamba4transformersLinks
☆13Updated last year
Alternatives and similar repositories for mamba4transformers
Users that are interested in mamba4transformers are comparing it to the libraries listed below
Sorting:
- ☆38Updated last month
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆87Updated 5 months ago
- ☆24Updated last year
- ☆120Updated 5 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆52Updated 3 months ago
- ☆103Updated 8 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆104Updated last week
- ☆33Updated 2 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆112Updated 7 months ago
- One-shot Entropy Minimization☆180Updated 2 months ago
- exploring whether LLMs perform case-based or rule-based reasoning☆30Updated last year
- ☆46Updated 2 months ago
- This the implementation of LeCo☆31Updated 7 months ago
- ☆48Updated 2 months ago
- ☆18Updated 2 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆39Updated 10 months ago
- ☆65Updated 9 months ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆62Updated last year
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆81Updated 3 months ago
- Implementation of "Decoding-time Realignment of Language Models", ICML 2024.☆19Updated last year
- The official Github repository for paper "R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation" (EMNLP 2024 Fin…☆35Updated 8 months ago
- ☆45Updated last month
- [EMNLP 2024 Findings] Unlocking Continual Learning Abilities in Language Models☆25Updated 10 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆123Updated 7 months ago
- ☆20Updated 4 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆33Updated last year
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆51Updated 10 months ago
- ☆151Updated last year
- ☆63Updated last year
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆90Updated 3 months ago