Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
☆20Dec 21, 2023Updated 2 years ago
Alternatives and similar repositories for SCL
Users that are interested in SCL are comparing it to the libraries listed below
Sorting:
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆12Feb 27, 2024Updated 2 years ago
- ☆10Jan 9, 2025Updated last year
- The code of "Image-text Retrieval via Preserving Main Semantic of Vision" in ICME 2023.☆15Dec 25, 2023Updated 2 years ago
- Official code for the CVPR 2024 Paper "Can Biases in ImageNet Models Explain Generalization?".☆13Jun 24, 2024Updated last year
- ☆16Nov 26, 2024Updated last year
- [AAAI 2026] Segment Anything Across Shots: A Method and Benchmark☆27Nov 16, 2025Updated 3 months ago
- The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024☆18Oct 11, 2024Updated last year
- ☆45Aug 14, 2023Updated 2 years ago
- ☆22Mar 7, 2025Updated 11 months ago
- Code for WACV 2024 paper ✨ "SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective".☆18Nov 4, 2023Updated 2 years ago
- The Pytorch implementation for "Video-Text Pre-training with Learned Regions"☆43Jul 15, 2022Updated 3 years ago
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment☆53Apr 9, 2024Updated last year
- Official Code of CVPR'23 Paper "VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision"☆22Apr 21, 2024Updated last year
- [CVPR 2023] Egocentric Audio-Visual Object Localization☆26Jan 6, 2024Updated 2 years ago
- MDMMT: Multidomain Multimodal Transformer for Video Retrieval☆26Jun 28, 2021Updated 4 years ago
- Offical implemention of the paper DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction☆29May 26, 2024Updated last year
- Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].☆35Nov 2, 2024Updated last year
- RSTPReid Dataset for Text-based Person Retrieval.☆32Sep 2, 2022Updated 3 years ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆41Nov 15, 2024Updated last year
- An unofficial implementation for paper "DenseCLIP: Extract Free Dense Labels from CLIP"☆23Jan 27, 2022Updated 4 years ago
- Code release for the paper "Egocentric Video Task Translation" (CVPR 2023 Highlight)☆34Jun 12, 2023Updated 2 years ago
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"☆37Oct 11, 2024Updated last year
- [ECCV 2024 Oral] ActionVOS: Actions as Prompts for Video Object Segmentation☆31Dec 4, 2024Updated last year
- Unofficial implementation of CVPR2021 paper "Perceptual Image Quality Assessment with Transformers"☆75Oct 21, 2021Updated 4 years ago
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆33Nov 7, 2023Updated 2 years ago
- ☆11Mar 11, 2024Updated last year
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 2 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- ICCV 2021☆34May 11, 2022Updated 3 years ago
- ☆65Feb 23, 2026Updated last week
- This repo contains the code to reproduce figures in my dissertation "Passive Imaging and Characterization of the Subsurface With Distribu…☆10Jun 14, 2018Updated 7 years ago
- [CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs☆157Jul 23, 2024Updated last year
- [ Official ] - PIPAL Dataset and Training Codebase. ECCV-2020, NTIRE-21/22.☆79Jan 3, 2022Updated 4 years ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆37Nov 27, 2024Updated last year
- ☆36Jul 9, 2025Updated 7 months ago
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- A codebase for data crawling and preprocessing for TTS and ASR systems training.☆22Updated this week
- The sparse Bayesian learning sandbox☆11Jul 4, 2021Updated 4 years ago