中科大跨模态智能组-每周论文分享
☆16Nov 20, 2022Updated 3 years ago
Alternatives and similar repositories for paper-reading_CrossModelGroup-USTC
Users that are interested in paper-reading_CrossModelGroup-USTC are comparing it to the libraries listed below
Sorting:
- Implementation of our IJCAI2022 oral paper, ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.☆24Aug 5, 2023Updated 2 years ago
- Implementation of our CVPR2022 paper, Negative-Aware Attention Framework for Image-Text Matching.☆119Jun 19, 2023Updated 2 years ago
- ☆18Mar 21, 2025Updated 11 months ago
- ☆21Jun 3, 2023Updated 2 years ago
- Paper reading notes in the field of Image-Text Matching/Retrieval.☆13Mar 25, 2022Updated 3 years ago
- [AAAI'20] Code release for "HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs".☆38Oct 4, 2023Updated 2 years ago
- Implementation of our AAAI2022 paper, Show Your Faith: Cross-Modal Confidence-Aware Network for Image-Text Matching.☆36Jun 16, 2023Updated 2 years ago
- Deep Semantic-Alignment Hashing(ICMR2020, Oral)☆18Oct 20, 2020Updated 5 years ago
- Pytorch implementation for Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation.☆18Jan 4, 2022Updated 4 years ago
- Implementation of our CVPR2020 paper, Graph Structured Network for Image-Text Matching☆170Oct 12, 2020Updated 5 years ago
- Implementation of our ACMMM2019 paper, Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching☆39Jun 19, 2023Updated 2 years ago
- The source code for the CVPR2020 paper "Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing".☆24Oct 10, 2020Updated 5 years ago
- RSTPReid Dataset for Text-based Person Retrieval.☆32Sep 2, 2022Updated 3 years ago
- Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)☆65Oct 19, 2020Updated 5 years ago
- Context-Aware Multi-View Summarization Network for Image-Text Matching. ACM MM'20☆29May 26, 2022Updated 3 years ago
- ☆34Jun 14, 2022Updated 3 years ago
- Mixture-of-Groups Attention for End-to-End Long Video Generation☆92Oct 22, 2025Updated 4 months ago
- [BMVC 2021] Text-Based Person Search with Limited Data☆47Aug 12, 2022Updated 3 years ago
- Cheatsheet for slurm command lines☆10Apr 9, 2023Updated 2 years ago
- ☆20Mar 10, 2025Updated 11 months ago
- CLIP-based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-modal Hashing Retrieval☆10Mar 18, 2024Updated last year
- ☆12Sep 11, 2021Updated 4 years ago
- ☆35May 4, 2021Updated 4 years ago
- ☆11Dec 27, 2022Updated 3 years ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆19Jul 10, 2025Updated 7 months ago
- Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation☆12Dec 5, 2025Updated 3 months ago
- ☆11Aug 20, 2025Updated 6 months ago
- I have created a dataset of Image-Text-Pairs by using the cosine similarity of the CLIP embeddings of the image & it's caption derrived f…☆16Apr 22, 2021Updated 4 years ago
- ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language☆40Aug 27, 2020Updated 5 years ago
- Tis is code for Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model (ACM MM 2024))☆12Aug 27, 2024Updated last year
- Official implementation of ICCV 2025 paper - DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization☆22Jul 13, 2025Updated 7 months ago
- [NAACL 2025] Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning☆12Feb 9, 2025Updated last year
- Generative label fused network for image–text matching☆10Jan 13, 2023Updated 3 years ago
- Code for the paper "Multi-Task Learning of Object States and State-Modifying Actions from Web Videos" published in TPAMI☆11Mar 3, 2024Updated 2 years ago
- ☆10Nov 23, 2023Updated 2 years ago
- Calculation of the entropy of the batch of images (whole image or patches)☆10Oct 15, 2021Updated 4 years ago
- Official repository for Scone (Subject-driven Composition and Distinction Enhancement) model, designed to support multi-subject compositi…☆28Jan 14, 2026Updated last month
- [ICLR'25] Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?☆12Apr 11, 2025Updated 10 months ago
- Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing☆36Dec 9, 2020Updated 5 years ago