Raphoo / DCSM_Ideal_CLIPLinks
Code for "Is CLIP ideal? No. Can we fix it? Yes!"
☆17Updated 5 months ago
Alternatives and similar repositories for DCSM_Ideal_CLIP
Users that are interested in DCSM_Ideal_CLIP are comparing it to the libraries listed below
Sorting:
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆31Updated 11 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆79Updated 10 months ago
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)☆29Updated last year
- Official implementation of the CVPR'24 paper [Adaptive Slot Attention: Object Discovery with Dynamic Slot Number]☆53Updated 7 months ago
- [CVPR 2024] Data and benchmark code for the EgoExoLearn dataset☆69Updated last week
- Code implementation of our ICCV 2025 paper: On Large Multimodal Models as Open-World Image Classifiers☆23Updated 3 weeks ago
- [AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)☆39Updated last year
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆47Updated 7 months ago
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆33Updated last year
- [CVPR 2024 Best paper award candidate] EGTR: Extracting Graph from Transformer for Scene Graph Generation☆123Updated last year
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆58Updated last year
- This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long …☆93Updated last year
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆86Updated last year
- [ICLR'25] Reconstructive Visual Instruction Tuning☆106Updated 4 months ago
- Test-Time Training on Video Streams☆64Updated 2 years ago
- Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]☆99Updated last year
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆29Updated last year
- [ICCV 2023] Prompt-aligned Gradient for Prompt Tuning☆166Updated 2 years ago
- Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".☆112Updated 4 months ago
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆59Updated last year
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆62Updated 7 months ago
- ☆80Updated 3 weeks ago
- VisualGPTScore for visio-linguistic reasoning☆27Updated last year
- [NeurIPS 2022] Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding☆52Updated last year
- ☆53Updated 9 months ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆47Updated 7 months ago
- ICLR‘24 Offical Implementation of Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization☆73Updated last year
- An implementation of several unsupervised object discovery models (Slot Attention, SLATE, GNM) in PyTorch with pre-trained models.☆14Updated 3 months ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆39Updated 4 months ago
- (NeurIPS 2023) Open-set visual object query search & localization in long-form videos☆24Updated last year