OpenMICG / AHPLinks
Adapter-Enhanced Hierarchical Cross-Modal Pre-training for Lightweight Medical Report Generation
☆12Updated last year
Alternatives and similar repositories for AHP
Users that are interested in AHP are comparing it to the libraries listed below
Sorting:
- A consistent Med-VQA dataset, C-SLAKE , extended by Slake for further consistency assessment .☆13Updated 2 years ago
- Multigranularity Contrastive cross-modal collaborative Generation (MCG) model for Video QA☆11Updated 2 years ago
- Local self-attention in Transformer for visual question answering☆13Updated last year
- A curated publication list on visual dialog☆14Updated 2 years ago
- Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering☆13Updated 2 years ago
- SotA text-only image/video method (IJCAI 2023)☆16Updated 2 years ago
- ☆18Updated 2 years ago
- CPL: Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning☆65Updated last year
- ☆24Updated 4 months ago
- Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)☆34Updated 3 years ago
- [CVPR 2022] Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization☆46Updated 2 years ago
- ☆36Updated 2 years ago
- [CVPR2022] Official code for Hierarchical Modular Network for Video Captioning. Our proposed HMN is implemented with PyTorch.☆50Updated 3 years ago
- Multimodal Large Models Are Effective Action Anticipators (IEEE TMM)🌳☆25Updated 5 months ago
- paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding (TVG), Video Grounding (VG), or Temporal Sentence Grounding in Vi…☆34Updated last month
- ☆12Updated last year
- [TPAMI 2024] This is the official Pytorch code for our paper "Context Disentangling and Prototype Inheriting for Robust Visual Grounding"…☆27Updated 9 months ago
- (CVPR2024) Realigning Confidence with Temporal Saliency Information for Point-level Weakly-Supervised Temporal Action Localization☆19Updated last year
- ☆194Updated last year
- ☆12Updated 2 years ago
- ☆29Updated 6 years ago
- Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval -- AAAI2025☆17Updated 6 months ago
- Code for CVPR23 paper: Learning to Generate Language-supervised and Open-vocabulary Scene Graph using Pre-trained Visual-Semantic Space☆42Updated 2 years ago
- [AAAI 2024] Official implementation of "Point-supervised Temporal Action Localization via Hierarchical Reliability Propagation"☆42Updated last year
- ☆13Updated 6 months ago
- The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.☆12Updated 4 years ago
- [CVPR 2023] Cascade Evidential Learning for Open-world Weakly-supervised Temporal Action Localization☆12Updated last year
- ☆16Updated 4 years ago
- [ECCV 2022] Dual-Evidential Learning for Weakly-supervised Temporal Action Localization☆49Updated last year
- Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …☆61Updated 3 years ago