xian-sh / UniSDNet
☆9Updated 3 months ago
Related projects: ⓘ
- Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)☆54Updated 7 months ago
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)☆46Updated last year
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆28Updated 5 months ago
- [TPAMI 2024] This is the Pytorch code for our paper "Context Disentangling and Prototype Inheriting for Robust Visual Grounding".☆13Updated 3 months ago
- [CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception☆34Updated last year
- [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”☆28Updated 5 months ago
- Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos☆18Updated 2 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆38Updated 2 months ago
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆31Updated last month
- Official implementation of "Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval (CVPR 2024 Highlight)"☆44Updated last month
- (CVPR2024) Realigning Confidence with Temporal Saliency Information for Point-level Weakly-Supervised Temporal Action Localization☆17Updated 3 months ago
- The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" accepted by NeurIPS…☆21Updated 4 months ago
- The code of the paper "Negative Pre-aware for Noisy Cross-modal Matching" in AAAI 2024.☆15Updated 4 months ago
- Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization☆13Updated last year
- ☆24Updated 5 months ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆50Updated 3 months ago
- [ICCV 2023 CLVL Workshop] Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts☆10Updated 11 months ago
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment☆44Updated 5 months ago
- [CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning☆103Updated 5 months ago
- The implementation of a paper entitled "Action Knowledge for Video Captioning with Graph Neural Networks" (JKSUCIS 2023).☆14Updated last year
- Official pytorch repository for "TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection" (AAAI 2024 Pape…☆30Updated last month
- This is a summary of research on noisy correspondence. There may be omissions. If anything is missing please get in touch with us. Our em…☆35Updated last week
- ☆25Updated last year
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆38Updated 3 months ago
- [TCSVT 2024] Official implementation of the paper: Benchmarking Micro-action Recognition: Dataset, Methods, and Applications☆14Updated 3 weeks ago
- MomentDiff: Generative Video Moment Retrieval from Random to Real--NeurIPS 2023☆71Updated 10 months ago
- The code of MGCC: Text-based Occluded Person Re-identification via Multi-Granularity Contrastive Consistency Learning☆11Updated 2 months ago
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆17Updated 3 months ago
- ☆25Updated last year
- ☆15Updated this week