OpenMICG / AHPLinks
Adapter-Enhanced Hierarchical Cross-Modal Pre-training for Lightweight Medical Report Generation
☆12Updated last year
Alternatives and similar repositories for AHP
Users that are interested in AHP are comparing it to the libraries listed below
Sorting:
- A consistent Med-VQA dataset, C-SLAKE , extended by Slake for further consistency assessment .☆13Updated 2 years ago
- Multigranularity Contrastive cross-modal collaborative Generation (MCG) model for Video QA☆11Updated 2 years ago
- ☆18Updated 2 years ago
- SotA text-only image/video method (IJCAI 2023)☆16Updated 2 years ago
- [TPAMI 2024] This is the official Pytorch code for our paper "Context Disentangling and Prototype Inheriting for Robust Visual Grounding"…☆27Updated 9 months ago
- Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)☆34Updated 3 years ago
- CPL: Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning☆65Updated last year
- Local self-attention in Transformer for visual question answering☆13Updated last year
- Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering☆13Updated 2 years ago
- Code for CVPR23 paper: Learning to Generate Language-supervised and Open-vocabulary Scene Graph using Pre-trained Visual-Semantic Space☆42Updated 2 years ago
- A simple and effective feature extractor for untrimmed videos☆13Updated 3 years ago
- ☆36Updated 2 years ago
- Code for Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning☆16Updated last year
- paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding (TVG), Video Grounding (VG), or Temporal Sentence Grounding in Vi…☆34Updated last month
- Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval -- AAAI2025☆17Updated 6 months ago
- Multimodal Large Models Are Effective Action Anticipators (IEEE TMM)🌳☆25Updated 5 months ago
- [CVPR2022] Official code for Hierarchical Modular Network for Video Captioning. Our proposed HMN is implemented with PyTorch.☆50Updated 3 years ago
- The code of "Image-text Retrieval via Preserving Main Semantic of Vision" in ICME 2023.☆15Updated 2 years ago
- Note: DO NOT USE IT! THIS CODE IS PROVEN TO CONTAIN DATA LEAKAGE! Archive version of "Text Is MASS: Modeling as Stochastic Embedding for …☆21Updated 9 months ago
- [AAAI 2024] Official implementation of "Point-supervised Temporal Action Localization via Hierarchical Reliability Propagation"☆42Updated last year
- ☆12Updated 2 years ago
- [ICCV'23] UATVR: Uncertainty-Adaptive Text-Video Retrieval☆13Updated 2 years ago
- ☆98Updated 3 years ago
- Implementation of 'End-to-End Transformer Based Model for Image Captioning' [AAAI 2022]☆69Updated last year
- The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.☆12Updated 4 years ago
- ☆16Updated 4 years ago
- ☆12Updated last year
- [CVPR 2025] Official implementation of the paper "DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for T…☆26Updated 7 months ago
- ☆194Updated last year
- Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".☆43Updated 3 years ago