OpenMICG / CoCoMeDLinks
Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering
☆13Updated 2 years ago
Alternatives and similar repositories for CoCoMeD
Users that are interested in CoCoMeD are comparing it to the libraries listed below
Sorting:
- [CVPR 2024] TeachCLIP for Text-to-Video Retrieval☆42Updated 9 months ago
- ☆27Updated 3 years ago
- Multigranularity Contrastive cross-modal collaborative Generation (MCG) model for Video QA☆11Updated 2 years ago
- ☆25Updated 3 years ago
- [2021 MultiMedia] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval☆42Updated 4 years ago
- Video Graph Transformer for Video Question Answering (ECCV'22)☆49Updated 2 years ago
- Fast Contextual Scene Graph Generation with Unbiased Context Augmentation☆12Updated 2 years ago
- (TIP'2023) Concept-Aware Video Captioning: Describing Videos with Effective Prior Information☆32Updated last year
- [ECCV2024] Nonverbal Interaction Detection☆28Updated last year
- [ECCV 2022] Dual-Evidential Learning for Weakly-supervised Temporal Action Localization☆49Updated last year
- Official implementation of "Recovering the Unbiased Scene Graphs from the Biased Ones" (ACMMM 2021)☆78Updated 3 years ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Updated last year
- ☆29Updated 2 years ago
- Source code for "Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction"☆48Updated last year
- https://layer6ai-labs.github.io/xpool/☆134Updated 2 years ago
- ☆97Updated 3 years ago
- ☆43Updated 4 years ago
- [ACM MM-24] Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization☆12Updated last year
- This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)☆36Updated 3 years ago
- ☆48Updated 2 years ago
- ☆13Updated 3 years ago
- paper list on Video Moment Retrieval (VMR), or Natural Language Video Localization (NLVL), or Temporal Sentence Grounding in Videos (TSGV…☆37Updated 3 years ago
- Vision Relation Transformer for Unbiased Scene Graph Generation (ICCV 2023)☆22Updated 2 years ago
- ☆14Updated 2 years ago
- ☆85Updated 2 years ago
- The code of IJCAI22 paper "GL-RG: Global-Local Representation Granularity for Video Captioning".☆18Updated 2 years ago
- Implementation of our IJCAI2022 oral paper, ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.☆24Updated 2 years ago
- [CVPR2023] Context De-confounded Emotion Recognition☆18Updated 2 years ago
- A video database bridging human actions and human-object relationships☆155Updated 5 years ago
- ☆39Updated 2 years ago