FudanDISC / Awesome-Large-Multimodal-ModelsLinks

Papers of "A Survey on Large Multi-Modal Models from the Perspective of Input-Output Space Extension"

☆11

Alternatives and similar repositories for Awesome-Large-Multimodal-Models

Users that are interested in Awesome-Large-Multimodal-Models are comparing it to the libraries listed below

Sorting:

swordlidev / LLaVA-MR
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval
☆8Updated 8 months ago
dengandong / GroundMoRe
☆13Updated 4 months ago
aimagelab / ReflectiVA
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
☆41Updated 3 weeks ago
AVC2-UESTC / OV2VSS
Official Implementation of Towards Open Vocabulary Video Semantic Segmentation
☆10Updated 5 months ago
aimagelab / COGT
[ICLR 2025] Causal Graphical Models for Vision-Language Compositional Understanding
☆9Updated 3 months ago
heitorrapela / ModPrompt
[ICCV2025] ModPrompt: Visual Modality Prompt for Adapting Vision-Language Object Detectors
☆14Updated last month
XiaokunFeng / MemVLT
[NeurIPS'24] MemVLT: Vision-Language Tracking with Adaptive Memory-based Prompts
☆16Updated 10 months ago
Flowerfan / VistaLLaMA
☆14Updated 8 months ago
ZhengYu518 / VL-Mamba
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
☆82Updated last year
AsuradaYuci / CLIMB-ReID
CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification（AAAI2025）
☆22Updated last month
om-ai-lab / GroundVLP
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)
☆69Updated last year
LutingWang / OADP
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
☆62Updated 5 months ago
jiawen-zhu / TrackGPT
Tracking with Human-Intent Reasoning
☆72Updated 9 months ago
jinga-lala / DAMEX
Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…
☆21Updated last year
Paranioar / UniPT
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
☆67Updated 9 months ago
Koorye / DePT
[CVPR 2024] Offical implemention of the paper "DePT: Decoupled Prompt Tuning"
☆107Updated 2 months ago
ylingfeng / FGVP
Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023
☆54Updated last year
924973292 / DeMo
【AAAI2025】DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
☆58Updated 5 months ago
zhoujiahuan1991 / AAAI2025-SVP
☆14Updated 3 months ago
Dmmm1997 / SimVG
[NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
☆85Updated 2 months ago
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆100Updated 2 months ago
GasolSun36 / MVP
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
☆22Updated 11 months ago
wusize / CLIM
[AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation
☆29Updated last year
Shengcao-Cao / groundLMM
Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision
☆41Updated 4 months ago
Rubics-Xuan / MRES
This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…
☆70Updated last year
maifoundations / Visionary-R1
Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
☆33Updated last month
ustc-hyin / ClearSight
Code for paper: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models
☆26Updated 7 months ago
SY-Xuan / Pink
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
☆91Updated 6 months ago
WeitaiKang / SegVG
[ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
☆60Updated 9 months ago
Meituan-AutoML / Lenna
☆86Updated last year