M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
☆46Jul 17, 2025Updated 8 months ago
Alternatives and similar repositories for M1
Users that are interested in M1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆238Oct 14, 2025Updated 5 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆122Sep 13, 2024Updated last year
- ☆17Dec 19, 2024Updated last year
- Code and data for paper "(How) do Language Models Track State?"☆22Mar 31, 2025Updated 11 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆516Mar 13, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Here we will test various linear attention designs.☆62Apr 25, 2024Updated last year
- ☆13Dec 15, 2025Updated 3 months ago
- Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)☆15Jan 7, 2025Updated last year
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆85Jan 12, 2025Updated last year
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆16Sep 18, 2025Updated 6 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- ☆31Dec 31, 2025Updated 2 months ago
- [COLM'25] A Controlled Study on Long Context Extension and Generalization in LLMs☆64Mar 9, 2026Updated 2 weeks ago
- continous batching and parallel acceleration for RWKV6☆22Jun 28, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- RADLADS training code☆37May 7, 2025Updated 10 months ago
- ☆16Nov 28, 2024Updated last year
- Code for the paper: https://arxiv.org/pdf/2309.06979.pdf☆21Jul 29, 2024Updated last year
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics☆72Jan 13, 2026Updated 2 months ago
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21Feb 9, 2026Updated last month
- Official codes for COLING 2024 paper "Robust and Scalable Model Editing for Large Language Models": https://arxiv.org/abs/2403.17431v1☆14Mar 27, 2024Updated last year
- Efficient retrieval head analysis with triton flash attention that supports topK probability☆13Jun 15, 2024Updated last year
- Privacy first GPS tracker. No accounts. No google services.☆29Nov 16, 2025Updated 4 months ago
- A simple video annotation made with python + OpenCV for detection in YoloV2 format☆16Oct 25, 2020Updated 5 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Jan 27, 2022Updated 4 years ago
- ☆51Jan 28, 2024Updated 2 years ago
- Sysdig agent Operator configure Sysdig platform in your Kubernetes cluster☆15Nov 2, 2023Updated 2 years ago
- Official Code Repository for the paper "Key-value memory in the brain"☆31Feb 25, 2025Updated last year
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆59Jun 18, 2024Updated last year
- A side project that follows all the acceleration tricks in tinyllama, with the minimal modification to the huggingface transformers code.☆13Sep 2, 2024Updated last year
- ☆14Dec 25, 2024Updated last year
- [ICLR 2025] No Preference Left Behind: Group Distributional Preference Optimization☆15Apr 21, 2025Updated 11 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Experiments on the impact of depth in transformers and SSMs.☆41Oct 23, 2025Updated 5 months ago
- ☆35Apr 12, 2024Updated last year
- VT Chat - A modern, privacy-first AI chat application with security☆42Feb 27, 2026Updated 3 weeks ago
- The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)☆16Feb 11, 2023Updated 3 years ago
- A repository for DenseSSMs☆90Apr 11, 2024Updated last year
- ☆17Oct 22, 2024Updated last year
- ☆234Nov 19, 2025Updated 4 months ago