Awesome Multimodal Modeling [Covers MLLM, UMM, and NMM]
☆302Apr 28, 2026Updated this week
Alternatives and similar repositories for Awesome-Multimodal-Modeling
Users that are interested in Awesome-Multimodal-Modeling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official repo for ”[WACV2025] Towards Accurate Unified Anomaly Segmentation“☆15Apr 14, 2025Updated last year
- Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights☆32Jan 9, 2026Updated 3 months ago
- Spa3R: Predictive Spatial Field Modeling for 3D Visual Reasoning☆49Mar 25, 2026Updated last month
- This repository houses the code for the paper - "The Neglected of VLMs"☆30Dec 31, 2025Updated 4 months ago
- Implementation of <Symbolic Graphics Programming with Large Language Models>☆38Sep 14, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer☆16Nov 21, 2024Updated last year
- ☆19Aug 7, 2025Updated 8 months ago
- DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency (AAAI24)☆61Aug 20, 2024Updated last year
- [SIGGRAPH 2026] OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation☆89Apr 8, 2026Updated 3 weeks ago
- The official implementation of the paper "Large Scale Knowledge Washing"☆10Jun 12, 2024Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆44Mar 11, 2025Updated last year
- [AAAI 2024] UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning☆12Dec 10, 2023Updated 2 years ago
- [ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.☆101Nov 1, 2025Updated 6 months ago
- [NeurIPS 2025] Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM☆24Feb 10, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A curated list of full-duplex spoken dialogue models & benchmarks☆61Updated this week
- ☆12Apr 19, 2024Updated 2 years ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆78Sep 19, 2025Updated 7 months ago
- An Empirical Study of GPT-4o Image Generation Capabilities☆29Apr 16, 2025Updated last year
- ☆15May 13, 2024Updated last year
- [ICCV 2023] Data-Free Class-Incremental Hand Gesture Recognition☆17Sep 21, 2023Updated 2 years ago
- Official Implementation of Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training☆181Mar 13, 2026Updated last month
- Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration☆28Feb 23, 2026Updated 2 months ago
- The official implementation of “One-for-More: Continual Diffusion Model for Anomaly Detection” (CVPR2025)☆62May 7, 2025Updated 11 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ECCV 2024 Oral] Official code of "Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather".☆62Sep 21, 2024Updated last year
- ☆16Sep 3, 2025Updated 8 months ago
- ☆52Aug 22, 2025Updated 8 months ago
- An implementation of several unsupervised object discovery models (Slot Attention, SLATE, GNM) in PyTorch with pre-trained models.☆15May 26, 2025Updated 11 months ago
- Official PyTorch Implementation of "SVG-T2I: Scaling up Text-to-Image Latent Diffusion Model Without Variational Autoencoder".☆148Dec 18, 2025Updated 4 months ago
- [CVPR 2026] Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video☆110Mar 24, 2026Updated last month
- [ICLR 2023 (Spotlight)] Domain-Indexing Variational Bayes: Interpretable Domain Index for Domain Adaptation☆40Jan 13, 2024Updated 2 years ago
- Official implementation of Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information☆11Sep 28, 2023Updated 2 years ago
- DL Backtrace is a new explainablity technique for deep learning models that works for any modality and model type.☆25Apr 21, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This is the repo for "Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition", CVPR2025.☆22Dec 22, 2025Updated 4 months ago
- [WACV 2025] Official code of "SEED4D: A Synthetic Ego-Exo Dynamic 4D Data Generator, Driving Dataset and Benchmark"☆23Sep 3, 2025Updated 8 months ago
- Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)☆14Mar 6, 2025Updated last year
- [NAACL 2025] MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning☆19May 31, 2025Updated 11 months ago
- Implementation of <Model Merging with Functional Dual Anchors>☆47Nov 23, 2025Updated 5 months ago
- [ECCV 2024 Oral] Pyramid Diffusion for Fine 3D Large Scene Generation☆137Apr 4, 2025Updated last year
- ☆32Jan 28, 2026Updated 3 months ago