OpenMICG / AHPLinks
Adapter-Enhanced Hierarchical Cross-Modal Pre-training for Lightweight Medical Report Generation
☆12Updated 5 months ago
Alternatives and similar repositories for AHP
Users that are interested in AHP are comparing it to the libraries listed below
Sorting:
- Multigranularity Contrastive cross-modal collaborative Generation (MCG) model for Video QA☆11Updated last year
- A consistent Med-VQA dataset, C-SLAKE , extended by Slake for further consistency assessment .☆13Updated last year
- Observation Driven Memory Synergistic Planning for Continuous Vision-Language Navigation☆10Updated last year
- Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering☆13Updated last year
- ☆12Updated last year
- SotA text-only image/video method (IJCAI 2023)☆16Updated last year
- [CVPR2022] Official code for Hierarchical Modular Network for Video Captioning. Our proposed HMN is implemented with PyTorch.☆52Updated 2 years ago
- Local self-attention in Transformer for visual question answering☆12Updated last year
- ☆17Updated last year
- Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)☆19Updated last year
- CPL: Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning☆63Updated last year
- Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)☆34Updated 2 years ago
- Code for Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning☆15Updated last year
- A curated publication list on visual dialog☆14Updated 2 years ago
- Multimodal Large Models Are Effective Action Anticipators (IEEE TMM)🌳☆23Updated 5 months ago
- [IEEE T-PAMI 2023] Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering☆19Updated last year
- ☆34Updated last year
- Baseline for REVERIE-Challenge using HOP☆10Updated 2 years ago
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆42Updated last year
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆74Updated 11 months ago
- [TPAMI 2024] This is the Pytorch code for our paper "Context Disentangling and Prototype Inheriting for Robust Visual Grounding".☆17Updated last month
- ☆11Updated 2 years ago
- A Video-to-Text Framework☆10Updated last year
- Implementation of 'End-to-End Transformer Based Model for Image Captioning' [AAAI 2022]☆67Updated last year
- Code for CVPR23 paper: Learning to Generate Language-supervised and Open-vocabulary Scene Graph using Pre-trained Visual-Semantic Space☆37Updated last year
- Official pytorch implementation of "Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding …☆42Updated 2 years ago
- The official implementation of “Cross-Modal Causal Representation Learning for Radiology Report Generation” (IEEE T-IP 2025)☆52Updated last month
- ☆11Updated last year
- Video Graph Transformer for Video Question Answering (ECCV'22)☆48Updated 2 years ago
- [AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning☆66Updated last year