shicaiwei123 / ICCV2025-GDLLinks
The official code for Boosting Multimodal Learning via Disentangled Gradient Learning
☆26Updated last month
Alternatives and similar repositories for ICCV2025-GDL
Users that are interested in ICCV2025-GDL are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation☆79Updated last month
- Official PyTorch repository for GRAM☆110Updated 7 months ago
- [AAAI 2024 Oral] M2CLIP: A Multimodal, Multi-Task Adapting Framework for Video Action Recognition☆71Updated 11 months ago
- [AAAI'25]: Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP☆18Updated 4 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆49Updated 2 months ago
- Official Repository for "Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality" (ECCV 2024)☆14Updated last year
- The official implementation of "Cross-modal Causal Relation Alignment for Video Question Grounding. (CVPR 2025 Highlight)"☆40Updated 7 months ago
- Latest Papers, Codes and Datasets on VTG-LLMs.☆63Updated last month
- [CVPR2025] Official implementation of the paper "Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practi…☆41Updated last month
- Code for dmrnet☆29Updated 5 months ago
- [CVPR 2025] Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation☆25Updated 5 months ago
- Official repository of "Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer", AAAI 2024☆24Updated last year
- [NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation☆33Updated last year
- [CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-…☆39Updated 8 months ago
- A comprehensive survey of Composed Multi-modal Retrieval (CMR), including Composed Image Retrieval (CIR) and Composed Video Retrieval (CV…☆75Updated 4 months ago
- [CVPR'25] EMOE: Modality-Specific Enhanced Dynamic Emotion Experts☆99Updated 5 months ago
- [CVPR 2023] LOGO: A Long-Form Video Dataset for Group Action Quality Assessment☆46Updated last year
- Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"☆50Updated last year
- Official implementation of paper "OED: Towards One-stage End-to-End Dynamic Scene Graph Generation".☆24Updated last year
- Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching (ACM SIGIR 2024, Pytorch Code)☆24Updated 10 months ago
- The official code for Improving Multimodal Learning via Imbalanced Learning☆18Updated last week
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆46Updated last year
- This is the official implementation of ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos☆35Updated last month
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆43Updated 6 months ago
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆40Updated 3 weeks ago
- [TAFFC 2025] The offical implementation of paper: Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using…☆22Updated 2 weeks ago
- Official Repo for CVPR 2024 Paper "FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Fully-Supervised Action Segmentatio…☆82Updated 6 months ago
- (NeurIPS 2023) Open-set visual object query search & localization in long-form videos☆25Updated last year
- Codes of the Fine-grained Textual Inversion network for Zero-Shot Composed Image Retrieval☆27Updated 8 months ago
- Official Implementation of GENIUS: A Generative Framework for Universal Multimodal Search, CVPR 2025☆42Updated 4 months ago