PerceptualAI-Lab / GOALLinks
An official implementation of "GOAL⚽: Global-local Object Alignment Learning" (CVPR 2025).
☆21Updated 4 months ago
Alternatives and similar repositories for GOAL
Users that are interested in GOAL are comparing it to the libraries listed below
Sorting:
- AlignCLIP: Improving Cross-Modal Alignment in CLIP (ICLR 2025)☆45Updated 5 months ago
- Composed Video Retrieval☆58Updated last year
- [AAAI'25, CVPRW 2024] Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".☆111Updated 7 months ago
- Official PyTorch repository for GRAM☆86Updated 3 months ago
- The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024☆16Updated 10 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆136Updated 3 months ago
- FineCLIP: Self-distilled Region-based CLIP for Better Fine-grained Understanding (NIPS24)☆25Updated 8 months ago
- [CVPR 2024 Accepted] TaskWeave: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection☆26Updated 10 months ago
- [CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…☆49Updated 11 months ago
- [AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer☆67Updated 5 months ago
- This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on V…☆13Updated 6 months ago
- [ECCV 2024] Official PyTorch implementation of LUT "Learning with Unmasked Tokens Drives Stronger Vision Learners"☆12Updated 8 months ago
- code for FineLIP☆27Updated 4 months ago
- The official pytorch implemention of our CVPR-2024 paper "MMA: Multi-Modal Adapter for Vision-Language Models".☆76Updated 3 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆45Updated 8 months ago
- [ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs☆52Updated last week
- ☆98Updated last year
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆100Updated 2 months ago
- [ICCV 2025] Official PyTorch Code for "Advancing Textual Prompt Learning with Anchored Attributes"☆86Updated last month
- [CVPR 2025] Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space☆16Updated 3 weeks ago
- [CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-…☆39Updated 3 months ago
- ☆22Updated last year
- [TPAMI 2024] Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding☆27Updated 11 months ago
- [ECCV 2022] What to Hide from Your Students: Attention-Guided Masked Image Modeling☆71Updated last year
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆60Updated last year
- [BMVC 2023] Zero-shot Composed Text-Image Retrieval☆53Updated 8 months ago
- ☆81Updated 2 years ago
- [ICLR 2025] - Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion☆50Updated 3 months ago
- ☆40Updated last year
- Official Implementation of "Read-only Prompt Optimization for Vision-Language Few-shot Learning", ICCV 2023☆53Updated last year