aimagelab / DiCO
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization (BMVC 2024 Oral ✨)
☆10Updated last week
Related projects: ⓘ
- [CBMI2024] Official repository of the paper "Is CLIP the main roadblock for fine-grained open-world perception?".☆17Updated 2 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆22Updated 3 months ago
- Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. CVPR 2023☆51Updated last year
- Multimodal Video Understanding Framework (MVU)☆23Updated 4 months ago
- Code for the paper Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models @ CVPR 2024☆53Updated 3 months ago
- [ICCV 2023] - Composed Image Retrieval on Common Objects in context (CIRCO) dataset☆47Updated last month
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆15Updated 3 weeks ago
- Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".☆39Updated 3 weeks ago
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆45Updated 2 weeks ago
- ☆29Updated 2 months ago
- [CVPR2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detect…☆39Updated last month
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆65Updated 4 months ago
- Language Repository for Long Video Understanding☆27Updated 3 months ago
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆17Updated 3 weeks ago
- Composed Video Retrieval☆42Updated 4 months ago
- [ECCV 2024] - Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation☆39Updated last week
- Official implementation of the paper "STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models"☆15Updated last week
- [ECCVW 2024] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".☆12Updated 3 weeks ago
- Pytorch implementation of Twelve Labs' Video Foundation Model evaluation framework & open embeddings☆16Updated 3 weeks ago
- [CVPR' 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆35Updated last month
- ☆45Updated 2 months ago
- Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.☆22Updated 8 months ago
- repo for paper titled: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment (AAAI'24 Oral)☆25Updated 4 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆20Updated 4 months ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆16Updated 3 weeks ago
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)☆72Updated 9 months ago
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆22Updated last week
- Vision Relation Transformer for Unbiased Scene Graph Generation (ICCV 2023)☆21Updated 11 months ago
- Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)☆36Updated 9 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆39Updated last week