☆51Jan 28, 2025Updated last year
Alternatives and similar repositories for Mixture-of-Mamba
Users that are interested in Mixture-of-Mamba are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆49Feb 4, 2026Updated 4 months ago
- KV cache compression via sparse coding☆18Oct 26, 2025Updated 8 months ago
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- ☆19Nov 4, 2025Updated 7 months ago
- LoG-VMamba: Local-Global Vision Mamba for Medical Image Segmentation☆28Oct 18, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- [ICLR'25] ApolloMoE: Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts☆53Nov 20, 2024Updated last year
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆16Apr 30, 2025Updated last year
- GoldFinch and other hybrid transformer components☆46Jul 20, 2024Updated last year
- Histomic Prognostic Signature (HiPS): A population-level computational histologic signature for invasive breast cancer prognosis☆33Apr 9, 2024Updated 2 years ago
- Differential equation neural operator☆22Sep 4, 2023Updated 2 years ago
- Dynamic config system based on python classes☆12Jan 27, 2023Updated 3 years ago
- LLM as World Models using Bayesian inference☆21May 27, 2025Updated last year
- [AAAI 2025] RRT-MVS: Recurrent Regularization Transformer for Multi-View Stereo☆18Nov 4, 2025Updated 7 months ago
- [ICME 2025] ICG-MVSNet: Learning Intra-view and Cross-view Relationships for Guidance in Multi-View Stereo☆17May 26, 2026Updated last month
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Simple repository for training small reasoning models☆52Feb 17, 2026Updated 4 months ago
- #UAI2020 Codes for PAC-Bayesian Contrastive Unsupervised Representation Learning☆14May 23, 2022Updated 4 years ago
- Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More☆25Feb 25, 2025Updated last year
- [CVPR 2025] 2DMamba: Efficient State Space Model for Image Representation☆84Jan 29, 2026Updated 5 months ago
- Decoding of the speech envelope from EEG using the VLAAI deep neural network☆14Sep 28, 2022Updated 3 years ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated last year
- XmodelLM☆38Nov 19, 2024Updated last year
- Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.☆116Jan 14, 2026Updated 5 months ago
- Efficient Finetuning for OpenAI GPT-OSS☆24Oct 2, 2025Updated 8 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆15Mar 20, 2025Updated last year
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]☆105Jul 28, 2025Updated 11 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆610Mar 13, 2026Updated 3 months ago
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆24Oct 13, 2025Updated 8 months ago
- Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening☆73May 18, 2025Updated last year
- [CVPR 2025] CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning☆48Apr 21, 2025Updated last year
- PyTorch implementation of Retentive Network: A Successor to Transformer for Large Language Models☆14Jul 20, 2023Updated 2 years ago
- Code for "StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model", AAAI2026 Oral☆57Jun 15, 2026Updated 2 weeks ago
- ☆14Dec 12, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code and Dataset release of "Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models" (NAACL 2024)☆10Oct 16, 2024Updated last year
- ☆17Feb 23, 2025Updated last year
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆89Mar 27, 2026Updated 3 months ago
- [ECCV 2024] Official PyTorch implementation of LUT "Learning with Unmasked Tokens Drives Stronger Vision Learners"☆14Dec 1, 2024Updated last year
- ☆13Oct 30, 2023Updated 2 years ago
- Combining Graph Neural Network and Mamba to Capture Local and Global Tissue Spatial Relationships in Whole Slide Images☆37Jun 3, 2025Updated last year
- [ICML 2026] Heima☆73May 20, 2026Updated last month