OpenMICG / CoCoMeDLinks
Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering
☆13Updated last year
Alternatives and similar repositories for CoCoMeD
Users that are interested in CoCoMeD are comparing it to the libraries listed below
Sorting:
- Multigranularity Contrastive cross-modal collaborative Generation (MCG) model for Video QA☆11Updated last year
- A consistent Med-VQA dataset, C-SLAKE , extended by Slake for further consistency assessment .☆13Updated last year
- Adapter-Enhanced Hierarchical Cross-Modal Pre-training for Lightweight Medical Report Generation☆12Updated 5 months ago
- Observation Driven Memory Synergistic Planning for Continuous Vision-Language Navigation☆10Updated last year
- The code of IJCAI22 paper "GL-RG: Global-Local Representation Granularity for Video Captioning".☆19Updated 2 years ago
- [CVPR2022] Official code for Hierarchical Modular Network for Video Captioning. Our proposed HMN is implemented with PyTorch.☆52Updated 2 years ago
- A Video-to-Text Framework☆10Updated last year
- Source code of our CVPR2024 paper TeachCLIP for Text-to-Video Retrieval☆35Updated last month
- Video Graph Transformer for Video Question Answering (ECCV'22)☆48Updated 2 years ago
- ☆14Updated last year
- Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)☆34Updated 2 years ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆74Updated 11 months ago
- Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)☆19Updated last year
- Span-based Localizing Network for Natural Language Video Localization (ACL 2020)☆108Updated 3 years ago
- ☆34Updated last year
- ☆14Updated last year
- Code for paper, "TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency" ECCV 2022☆38Updated 2 years ago
- ☆12Updated 2 years ago
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment☆52Updated last year
- [ECCV2024] Nonverbal Interaction Detection☆27Updated 7 months ago
- The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)☆74Updated 2 years ago
- [2021 MultiMedia] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval☆42Updated 3 years ago
- (TIP'2023) Concept-Aware Video Captioning: Describing Videos with Effective Prior Information☆29Updated 6 months ago
- Source code for "Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction"☆46Updated last year
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆58Updated last year
- Official pytorch implementation of the AAAI 2021 paper "Semantic Grouping Network for Video Captioning"☆54Updated 3 years ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆21Updated 4 months ago
- [ICCV 2023 CLVL Workshop] Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts☆14Updated 5 months ago
- Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering☆27Updated 4 years ago
- paper list on Video Moment Retrieval (VMR), or Natural Language Video Localization (NLVL), or Temporal Sentence Grounding in Videos (TSGV…☆31Updated 2 years ago