Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]
☆10Jul 22, 2024Updated last year
Alternatives and similar repositories for GCG
Users that are interested in GCG are comparing it to the libraries listed below
Sorting:
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆18Oct 9, 2024Updated last year
- ☆13Feb 26, 2024Updated 2 years ago
- This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal …☆23Aug 18, 2025Updated 6 months ago
- Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)☆19Mar 9, 2024Updated last year
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆31Jun 28, 2024Updated last year
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Sep 27, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- Motion Question Answering via Modular Motion Programs☆38May 24, 2023Updated 2 years ago
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] 🎞️ LVNet.☆42Feb 10, 2026Updated 2 weeks ago
- [CVPR 2023] Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition☆37Oct 24, 2024Updated last year
- Code for the paper "Faster Neural Network Training with Approximate Tensor Operations"☆10Oct 23, 2021Updated 4 years ago
- ☆13Aug 28, 2024Updated last year
- ☆11Jul 31, 2020Updated 5 years ago
- [CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models☆35Feb 21, 2026Updated last week
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".☆54May 25, 2025Updated 9 months ago
- ☆23Feb 12, 2026Updated 2 weeks ago
- Code for MME-SID accepted to CIKM 2025 Full Research track.☆27Oct 29, 2025Updated 4 months ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆20Aug 1, 2025Updated 7 months ago
- quagga☆10Apr 7, 2020Updated 5 years ago
- ☆16Oct 9, 2024Updated last year
- Agentic Keyframe Search for Video Question Answering☆16Apr 7, 2025Updated 10 months ago
- TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs☆23Sep 21, 2025Updated 5 months ago
- [ICCV 2025] Object-centric Video Question Answering with Visual Grounding and Referring☆24Aug 8, 2025Updated 6 months ago
- Project for SNARE benchmark☆11Jun 5, 2024Updated last year
- Implementation of the paper 'Stochastic Wasserstein Barycenters'☆11Oct 17, 2018Updated 7 years ago
- [ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision☆12Sep 17, 2023Updated 2 years ago
- Fully buildable project files of Little Lead-rical Leader Pack (Yuni, Kokkoro, Kyoka), a leader mod for Sid Meier's Civilization VI.☆12Aug 13, 2023Updated 2 years ago
- [EMNLP 2024 Industry track] MERLIN : Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank P…☆14Mar 4, 2025Updated 11 months ago
- ☆12Dec 15, 2023Updated 2 years ago
- Code for paper "W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering"☆15Oct 2, 2025Updated 5 months ago
- [NDSS 2026] Official repo for Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography☆29Jan 2, 2026Updated 2 months ago
- ☆11Sep 15, 2023Updated 2 years ago
- 中国历年GDP和人口数据可视化☆13Jan 18, 2023Updated 3 years ago
- ☆14Oct 18, 2024Updated last year
- A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning☆30Dec 17, 2025Updated 2 months ago
- Multigranularity Contrastive cross-modal collaborative Generation (MCG) model for Video QA☆11Dec 13, 2023Updated 2 years ago
- [NeurIPS 2022] Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding☆53Mar 5, 2024Updated last year
- ☆15May 14, 2025Updated 9 months ago