OpenMICG / mcgLinks
Multigranularity Contrastive cross-modal collaborative Generation (MCG) model for Video QA
☆11Updated last year
Alternatives and similar repositories for mcg
Users that are interested in mcg are comparing it to the libraries listed below
Sorting:
- Observation Driven Memory Synergistic Planning for Continuous Vision-Language Navigation☆10Updated last year
- Adapter-Enhanced Hierarchical Cross-Modal Pre-training for Lightweight Medical Report Generation☆12Updated 5 months ago
- A consistent Med-VQA dataset, C-SLAKE , extended by Slake for further consistency assessment .☆13Updated last year
- Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering☆13Updated last year
- ☆12Updated last year
- CPL: Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning☆63Updated last year
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆74Updated 11 months ago
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆11Updated 11 months ago
- ☆31Updated 9 months ago
- ☆34Updated last year
- Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)☆19Updated last year
- The champion solution for Ego4D Natural Language Queries Challenge in CVPR 2023☆18Updated last year
- ☆13Updated 5 months ago
- Official pytorch implementation of "Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding …☆42Updated 2 years ago
- Pytorch Implementation of ECCV'22 paper: Video Activity Localisation with Uncertainties in Temporal Boundary☆17Updated 2 years ago
- ☆11Updated 2 years ago
- Official Pytorch Implementation of 'BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos'☆32Updated 4 months ago
- ☆24Updated 2 months ago
- The official implementation of Error Detection in Egocentric Procedural Task Videos☆16Updated 10 months ago
- Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)☆34Updated 2 years ago
- ☆11Updated last year
- Note: DO NOT USE IT! THIS CODE IS PROVEN TO CONTAIN DATA LEAKAGE! Archive version of "Text Is MASS: Modeling as Stochastic Embedding for …☆15Updated last month
- SotA text-only image/video method (IJCAI 2023)☆16Updated last year
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆29Updated last week
- ☆8Updated 7 months ago
- A Video-to-Text Framework☆10Updated last year
- Weakly Supervised Video Moment Localisation with Contrastive Negative Sample Mining☆27Updated 3 years ago
- [CVPR2022] Official code for Hierarchical Modular Network for Video Captioning. Our proposed HMN is implemented with PyTorch.☆52Updated 2 years ago
- [TIP25] Code for "Text-Video Retrieval with Global-Local Semantic Consistent Learning"☆13Updated last month
- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)☆158Updated 11 months ago