Multigranularity Contrastive cross-modal collaborative Generation (MCG) model for Video QA
☆12Dec 13, 2023Updated 2 years ago
Alternatives and similar repositories for mcg
Users that are interested in mcg are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A consistent Med-VQA dataset, C-SLAKE , extended by Slake for further consistency assessment .☆17Jan 12, 2024Updated 2 years ago
- Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering☆16Jan 12, 2024Updated 2 years ago
- Adapter-Enhanced Hierarchical Cross-Modal Pre-training for Lightweight Medical Report Generation☆15Jan 25, 2025Updated last year
- Observation Driven Memory Synergistic Planning for Continuous Vision-Language Navigation☆33Jun 14, 2024Updated last year
- [The Visual Computer] The official implementation of "Feature Distribution Normalization Network for Multi-View Stereo”.☆15Mar 5, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆13Oct 15, 2025Updated 7 months ago
- [IEEE JSTARS] The official implementation of "Surface Depth Estimation from Multi-view Stereo Satellite Images with Distribution Contrast…☆11May 16, 2025Updated last year
- ☆17Jul 21, 2022Updated 3 years ago
- TTRV: Test-Time Reinforcement Learning for Vision–Language Models (CVPR 2026)☆43Mar 8, 2026Updated 3 months ago
- ☆14Feb 26, 2024Updated 2 years ago
- Repository of our accepted CVPR2022 paper "Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-La…☆28Mar 4, 2022Updated 4 years ago
- Official code for ''RAG Meets Temporal Graphs: Time-Sensitive Modeling and Retrieval for Evolving Knowledge''.☆34Feb 25, 2026Updated 3 months ago
- [CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models☆37Feb 21, 2026Updated 3 months ago
- ☆39Mar 19, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆33Feb 12, 2026Updated 3 months ago
- VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding☆58May 1, 2026Updated last month
- PyTorch implementation of video captioning☆13Sep 24, 2017Updated 8 years ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Apr 18, 2026Updated last month
- code for paper Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering☆14Aug 13, 2024Updated last year
- ☆15Aug 12, 2022Updated 3 years ago
- Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents☆42Apr 13, 2026Updated last month
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆15Apr 7, 2026Updated 2 months ago
- A Layered Memory Network for MovieQA☆16Apr 27, 2018Updated 8 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆16Oct 25, 2024Updated last year
- A Multi-Agent Approach Integrating Socratic Guidance for Automated Prompt Optimization☆18Dec 15, 2025Updated 5 months ago
- A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning☆41Mar 12, 2026Updated 2 months ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆21Feb 14, 2025Updated last year
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Jul 22, 2024Updated last year
- [arXiv'25] LiCoMemory: Lightweight and Cognitive Agentic Memory for Efficient Long-Term Reasoning☆45Jan 6, 2026Updated 5 months ago
- CPL: Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning☆65Mar 22, 2026Updated 2 months ago
- The code of IJCAI2022 paper, Declaration-based Prompt Tuning for Visual Question Answering☆20May 10, 2022Updated 4 years ago
- PyTorch code for ROLL, a knowledge-based video story question answering model.☆21Sep 29, 2020Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ACL 2025 Findings] Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vis…☆25Jul 21, 2024Updated last year
- Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"☆50Jun 2, 2026Updated last week
- Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)☆29Jun 6, 2025Updated last year
- MokA: Multimodal Low-Rank Adaptation for MLLMs☆90Dec 30, 2025Updated 5 months ago
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆18Oct 9, 2024Updated last year
- MR. Video: MapReduce is the Principle for Long Video Understanding☆31Apr 23, 2025Updated last year
- The source code of Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents.☆85Jan 31, 2026Updated 4 months ago