FudanDISC / Awesome-Large-Multimodal-ModelsLinks
Papers of "A Survey on Large Multi-Modal Models from the Perspective of Input-Output Space Extension"
☆11Updated 8 months ago
Alternatives and similar repositories for Awesome-Large-Multimodal-Models
Users that are interested in Awesome-Large-Multimodal-Models are comparing it to the libraries listed below
Sorting:
- LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval☆8Updated 8 months ago
- ☆13Updated 4 months ago
- [CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering☆41Updated 3 weeks ago
- Official Implementation of Towards Open Vocabulary Video Semantic Segmentation☆10Updated 5 months ago
- [ICLR 2025] Causal Graphical Models for Vision-Language Compositional Understanding☆9Updated 3 months ago
- [ICCV2025] ModPrompt: Visual Modality Prompt for Adapting Vision-Language Object Detectors☆14Updated last month
- [NeurIPS'24] MemVLT: Vision-Language Tracking with Adaptive Memory-based Prompts☆16Updated 10 months ago
- ☆14Updated 8 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆82Updated last year
- CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification(AAAI2025)☆22Updated last month
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆69Updated last year
- Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection☆62Updated 5 months ago
- Tracking with Human-Intent Reasoning☆72Updated 9 months ago
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…☆21Updated last year
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆67Updated 9 months ago
- [CVPR 2024] Offical implemention of the paper "DePT: Decoupled Prompt Tuning"☆107Updated 2 months ago
- Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023☆54Updated last year
- 【AAAI2025】DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification☆58Updated 5 months ago
- ☆14Updated 3 months ago
- [NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion☆85Updated 2 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆100Updated 2 months ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆22Updated 11 months ago
- [AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation☆29Updated last year
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆41Updated 4 months ago
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…☆70Updated last year
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆33Updated last month
- Code for paper: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models☆26Updated 7 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆91Updated 6 months ago
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆60Updated 9 months ago
- ☆86Updated last year