☆50Jan 28, 2025Updated last year
Alternatives and similar repositories for Mixture-of-Mamba
Users that are interested in Mixture-of-Mamba are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for "Adaptive Self-improvement LLM Agentic System for ML Library Development" (ICML 2025)☆16Jan 6, 2026Updated 4 months ago
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆49Feb 4, 2026Updated 3 months ago
- ☆19Nov 4, 2025Updated 6 months ago
- [ICLR'25] ApolloMoE: Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts☆52Nov 20, 2024Updated last year
- Fast and memory-efficient exact attention☆20Updated this week
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆15Apr 30, 2025Updated last year
- GoldFinch and other hybrid transformer components☆46Jul 20, 2024Updated last year
- ☆12Feb 22, 2024Updated 2 years ago
- ActSort is an active learning accelerated cell sorter tool for calcium imaging.☆26Apr 27, 2026Updated 3 weeks ago
- [AAAI 2025] RRT-MVS: Recurrent Regularization Transformer for Multi-View Stereo☆18Nov 4, 2025Updated 6 months ago
- A collection of resources and information for concrete skills that are helpful when pursuing a PhD in computer science (specifically in M…☆23Apr 18, 2023Updated 3 years ago
- Simple repository for training small reasoning models☆50Feb 17, 2026Updated 3 months ago
- #UAI2020 Codes for PAC-Bayesian Contrastive Unsupervised Representation Learning☆14May 23, 2022Updated 3 years ago
- Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More☆25Feb 25, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [CVPR 2025] 2DMamba: Efficient State Space Model for Image Representation☆84Jan 29, 2026Updated 3 months ago
- This is the offical repository for "Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion" (ICCV 2023).☆73Apr 30, 2024Updated 2 years ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated last year
- XmodelLM☆38Nov 19, 2024Updated last year
- PyTorch Implementation of "BOOTPLACE: Bootstrapped Object Placement with Detection Transformers", CVPR 2025☆27Aug 8, 2025Updated 9 months ago
- Principled learning method for Wasserstein distributionally robust optimization with local perturbations (ICML 2020)☆21Mar 24, 2023Updated 3 years ago
- Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.☆113Jan 14, 2026Updated 4 months ago
- Efficient Finetuning for OpenAI GPT-OSS☆24Oct 2, 2025Updated 7 months ago
- ☆15Mar 20, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]☆102Jul 28, 2025Updated 9 months ago
- A LLM client for use from the command line or IDE. 一个在命令行或者IDE中使用的大语言模型客户端☆16Apr 30, 2026Updated 2 weeks ago
- [ACL Findings 2026] Official Implementation of "FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acc…☆31Apr 14, 2026Updated last month
- So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset…☆17Mar 26, 2025Updated last year
- A mini-redis learn from tokio.☆12Dec 20, 2022Updated 3 years ago
- [CVPR 2025] CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning☆44Apr 21, 2025Updated last year
- Video Diffusion State Space Models☆19Mar 27, 2024Updated 2 years ago
- PyTorch implementation of Retentive Network: A Successor to Transformer for Large Language Models☆14Jul 20, 2023Updated 2 years ago
- ☆61May 13, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Efficient Computation and Analysis of Distributional Shapley Values (AISTATS 2021)☆22Oct 19, 2023Updated 2 years ago
- [MMM‘24 Oral]CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer☆18Apr 18, 2024Updated 2 years ago
- ☆17Feb 23, 2025Updated last year
- ☆13Oct 30, 2023Updated 2 years ago
- ☆10Apr 8, 2018Updated 8 years ago
- This repository is the official data collection of MMFundus (Multimodal Fundus) dataset.☆13Feb 2, 2026Updated 3 months ago
- PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability☆40Mar 18, 2025Updated last year