zjukg / Structure-CLIP
[Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
☆104Updated 2 months ago
Related projects: ⓘ
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆21Updated 5 months ago
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆128Updated 4 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆37Updated 11 months ago
- USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval, TIP 2024☆19Updated 5 months ago
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models☆30Updated 6 months ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆57Updated 3 months ago
- code for studying OpenAI's CLIP explainability☆25Updated 2 years ago
- [SIGIR 2024] - Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval☆19Updated 2 months ago
- Implementation of our paper, 'Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval.'☆19Updated 9 months ago
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆41Updated 2 months ago
- Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)☆62Updated 7 months ago
- ☆27Updated 7 months ago
- Update 2020☆68Updated 2 years ago
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆31Updated last month
- Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 202…☆48Updated last year
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆27Updated 5 months ago
- Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text …☆11Updated 2 weeks ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆63Updated 4 months ago
- [Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection☆25Updated 7 months ago
- ☆11Updated 2 months ago
- ☆13Updated last year
- ☆23Updated last year
- This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World"…☆41Updated 6 months ago
- This is a summary of research on noisy correspondence. There may be omissions. If anything is missing please get in touch with us. Our em…☆35Updated last week
- [Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning☆46Updated 7 months ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''