tuyunbin / SCORER
[ICCV 2023] This is the Pytorch code for our paper "Self-Supervised Cross-View Representation Reconstruction for Change Captioning".
☆19Updated 9 months ago
Alternatives and similar repositories for SCORER:
Users that are interested in SCORER are comparing it to the libraries listed below
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆46Updated 8 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆52Updated 9 months ago
- Composed Video Retrieval☆54Updated 11 months ago
- CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)☆34Updated 2 years ago
- NegCLIP.☆31Updated 2 years ago
- [CVPR 2023] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation☆61Updated last month
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆41Updated last year
- ICLR‘24 Offical Implementation of Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization☆72Updated last year
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆41Updated 6 months ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆63Updated 10 months ago
- [BMVC 2023] Zero-shot Composed Text-Image Retrieval☆53Updated 4 months ago
- ☆29Updated last year
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆49Updated last year
- ☆30Updated 6 months ago
- [ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval☆16Updated 2 years ago
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆78Updated 3 months ago
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)☆50Updated last year
- With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023☆17Updated 10 months ago
- [CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations☆61Updated 3 weeks ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆27Updated 11 months ago
- Official pytorch repository for "TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection" (AAAI 2024 Pape…☆45Updated 2 months ago
- ☆16Updated last year
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆57Updated 3 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆34Updated last year
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 8 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆37Updated last month
- ☆24Updated last year
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆55Updated 10 months ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆31Updated 2 weeks ago