ChenAnno / SPIRIT_TOMM2024
Official implementation for "SPIRIT: Style-guided Patch Interaction for Fashion Image Retrieval with Text Feedback"
☆15Updated 8 months ago
Alternatives and similar repositories for SPIRIT_TOMM2024:
Users that are interested in SPIRIT_TOMM2024 are comparing it to the libraries listed below
- Official implementation for "FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval"☆18Updated 8 months ago
- Official implementation for "Real20M: A Large-scale E-commerce Dataset for Cross-domain Retrieval"☆26Updated 8 months ago
- Codes of the Fine-grained Textual Inversion network for Zero-Shot Composed Image Retrieval☆20Updated 7 months ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆17Updated this week
- CPL: Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning☆62Updated 11 months ago
- Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization☆14Updated last year
- Official github repo for ICCV2023 paper 'Multi-event Video-Text Retrieval'☆18Updated last year
- Context-I2W: Mapping Images to Context-dependent words for Accurate Zero-Shot Composed Image Retrieval [AAAI 2024 Oral]☆50Updated 4 months ago
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Updated 8 months ago
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆17Updated 7 months ago
- ☆28Updated 6 months ago
- ☆12Updated last year
- [SIGIR 2024] - Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval☆31Updated 8 months ago
- A comprehensive survey of Composed Multi-modal Retrieval (CMR), including Composed Image Retrieval (CIR) and Composed Video Retrieval (CV…☆22Updated 3 weeks ago
- SotA text-only image/video method (IJCAI 2023)☆16Updated last year
- ☆13Updated this week
- This is a summary of research on noisy correspondence. There may be omissions. If anything is missing please get in touch with us. Our em…☆56Updated last week
- ☆69Updated last year
- R1-like Video-LLM for Temporal Grounding☆62Updated last week
- Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".☆22Updated last month
- Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching (ACM SIGIR 2024, Pytorch Code)☆23Updated last month
- The code of the paper "Negative Pre-aware for Noisy Cross-modal Matching" in AAAI 2024.☆23Updated 10 months ago
- ☆24Updated 6 months ago
- the official repo for EMNLP 2024 (main) paper "EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimo…☆19Updated 2 weeks ago
- [CVPR25] A ChatGPT-Prompted Visual hallucination Evaluation Dataset, featuring over 100,000 data samples and four advanced evaluation mod…☆14Updated last month
- MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer☆41Updated 6 months ago
- USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval, TIP 2024☆29Updated last year
- code for downloading videos from HowTo100M dataset☆16Updated 3 years ago
- [2023 ACL] CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding☆30Updated last year
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆40Updated 11 months ago