fortunechen / paper-reading_CrossModelGroup-USTCView external linksLinks
中科大跨模态智能组-每周论文分享
☆16Nov 20, 2022Updated 3 years ago
Alternatives and similar repositories for paper-reading_CrossModelGroup-USTC
Users that are interested in paper-reading_CrossModelGroup-USTC are comparing it to the libraries listed below
Sorting:
- Implementation of our IJCAI2022 oral paper, ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.☆24Aug 5, 2023Updated 2 years ago
- CVPR 2025 Accepted Papers☆23Dec 20, 2025Updated last month
- Implementation of our CVPR2022 paper, Negative-Aware Attention Framework for Image-Text Matching.☆119Jun 19, 2023Updated 2 years ago
- ☆21Jun 3, 2023Updated 2 years ago
- Paper reading notes in the field of Image-Text Matching/Retrieval.☆13Mar 25, 2022Updated 3 years ago
- ☆13Feb 1, 2022Updated 4 years ago
- Implementation of our AAAI2022 paper, Show Your Faith: Cross-Modal Confidence-Aware Network for Image-Text Matching.☆36Jun 16, 2023Updated 2 years ago
- Implementation of our CVPR2020 paper, Graph Structured Network for Image-Text Matching☆170Oct 12, 2020Updated 5 years ago
- Implementation of our ACMMM2019 paper, Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching☆39Jun 19, 2023Updated 2 years ago
- The source code for the CVPR2020 paper "Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing".☆24Oct 10, 2020Updated 5 years ago
- RSTPReid Dataset for Text-based Person Retrieval.☆32Sep 2, 2022Updated 3 years ago
- Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)☆65Oct 19, 2020Updated 5 years ago
- Context-Aware Multi-View Summarization Network for Image-Text Matching. ACM MM'20☆29May 26, 2022Updated 3 years ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- code base for vision transformers☆36Dec 4, 2021Updated 4 years ago
- [BMVC 2021] Text-Based Person Search with Limited Data☆47Aug 12, 2022Updated 3 years ago
- [AAAI2024] An official pytorch implement of the paper: Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Underst…☆13Dec 8, 2024Updated last year
- [MM'22 Oral] AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation☆11Apr 3, 2023Updated 2 years ago
- I have created a dataset of Image-Text-Pairs by using the cosine similarity of the CLIP embeddings of the image & it's caption derrived f…☆15Apr 22, 2021Updated 4 years ago
- ☆11Aug 20, 2025Updated 5 months ago
- ☆11Dec 27, 2022Updated 3 years ago
- ☆12Sep 11, 2021Updated 4 years ago
- Learning Fragment Self-Attention Embeddings for Image-Text Matching, in ACM MM 2019☆41Sep 24, 2019Updated 6 years ago
- ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language☆40Aug 27, 2020Updated 5 years ago
- ☆12Dec 20, 2024Updated last year
- ☆10Nov 23, 2023Updated 2 years ago
- Dual-path CNN with Max Gated block for Text-Based Person Re-identification☆10Dec 5, 2020Updated 5 years ago
- Tis is code for Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model (ACM MM 2024))☆12Aug 27, 2024Updated last year
- Official repository for Scone (Subject-driven Composition and Distinction Enhancement) model, designed to support multi-subject compositi…☆28Jan 14, 2026Updated last month
- Official implementation of ICCV 2025 paper - DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization☆22Jul 13, 2025Updated 7 months ago
- Simple Tensorflow implementation of "MirrorGAN: Learning Text-to-image Generation by Redescription" (CVPR 2019)☆15Mar 23, 2020Updated 5 years ago
- Calculation of the entropy of the batch of images (whole image or patches)☆10Oct 15, 2021Updated 4 years ago
- Code for the paper "Multi-Task Learning of Object States and State-Modifying Actions from Web Videos" published in TPAMI☆11Mar 3, 2024Updated last year
- [NAACL 2025] Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning☆12Feb 9, 2025Updated last year
- CLIP-based Fusion-modal Reconstructing Hashing for Unsupervised Large-scale Cross-modal Retrieval☆13Aug 7, 2023Updated 2 years ago
- Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing☆36Dec 9, 2020Updated 5 years ago
- ☆45Dec 26, 2021Updated 4 years ago
- [ECCV 2020] Official code for "Comprehensive Image Captioning via Scene Graph Decomposition"☆99Aug 20, 2024Updated last year
- [ Arxiv 2023 ] This repository contains the code for "MUPPET: Multi-Modal Few-Shot Temporal Action Detection"☆15Aug 30, 2023Updated 2 years ago