marco-garosi / ComCaLinks
Official implementation of the CVPR '25 highlight paper "Compositional Caching for Training-free Open-vocabulary Attribute Detection"
☆22Updated 11 months ago
Alternatives and similar repositories for ComCa
Users that are interested in ComCa are comparing it to the libraries listed below
Sorting:
- Official implementation of "Test-Time Zero-Shot Temporal Action Localization", CVPR 2024☆67Updated last year
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆92Updated 8 months ago
- [CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations☆124Updated 3 months ago
- Code implementation of our ICCV 2025 paper: On Large Multimodal Models as Open-World Image Classifiers☆26Updated last week
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)☆134Updated 8 months ago
- [ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition☆93Updated 11 months ago
- [CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training☆37Updated 8 months ago
- [ECCV 2024] - Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation☆64Updated 5 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆143Updated 11 months ago
- Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.☆266Updated last year
- ☆15Updated 9 months ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆199Updated last year
- Code for Scaling Language-Free Visual Representation Learning (WebSSL)☆245Updated 7 months ago
- [ECCV 2024] Official PyTorch implementation of TC-CLIP "Leveraging Temporal Contextualization for Video Action Recognition"☆82Updated 9 months ago
- Composed Video Retrieval☆61Updated last year
- ☆190Updated last year
- [CVPR 2025] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception☆146Updated 6 months ago
- Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]☆76Updated 5 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆80Updated last year
- Official implementation of "A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives", accepted at CVPR 2…☆24Updated last year
- Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".☆118Updated 2 months ago
- Large-Vocabulary Video Instance Segmentation dataset☆95Updated last year
- [ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion☆195Updated 4 months ago
- [ECCV'24] Official PyTorch implementation of In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation☆48Updated last year
- [ECCV2024] PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects☆55Updated last year
- [CVPR 2025] PyTorch implementation of T-CORE, introduced in "When the Future Becomes the Past: Taming Temporal Correspondence for Self-su…☆17Updated last month
- [CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…☆50Updated last year
- This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long …☆95Updated last year
- Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"☆92Updated 9 months ago
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆46Updated 11 months ago