VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding
☆54Mar 24, 2026Updated this week
Alternatives and similar repositories for VideoDetective
Users that are interested in VideoDetective are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Jan 1, 2026Updated 2 months ago
- [EMNLP'2023 Findings] MoqaGPT, for zero-shot multimodal question answering with LLMs☆13Dec 28, 2024Updated last year
- Code for a multi-agent particle environment used in the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"☆15Aug 30, 2021Updated 4 years ago
- ✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…☆77Apr 28, 2025Updated 10 months ago
- ThalamusDB: semantic query processing on multimodal data☆115Aug 27, 2025Updated 6 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [CVPR 2026] Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding☆73Updated this week
- TTRV: Test-Time Reinforcement Learning for Vision–Language Models (CVPR 2026)☆37Mar 8, 2026Updated 2 weeks ago
- ☆13Feb 26, 2024Updated 2 years ago
- Verify MAPPO in task ‘simple_spread_v3‘☆15Aug 10, 2024Updated last year
- ☆15Jul 10, 2019Updated 6 years ago
- Benchmarking Semantic Query Processing Engines☆52Updated this week
- [CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models☆37Feb 21, 2026Updated last month
- ☆27Feb 12, 2026Updated last month
- Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation☆31Mar 28, 2025Updated 11 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- OpenAI's Gym Car-Racing-V0 environment was tackled and, subsequently, solved using a variety of Reinforcement Learning methods including …☆21Aug 7, 2022Updated 3 years ago
- [NeurIPS 2025] ScaleKV: Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression☆50Mar 13, 2026Updated last week
- ☆15Aug 12, 2022Updated 3 years ago
- ☆53Jan 5, 2026Updated 2 months ago
- Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion☆101Mar 12, 2026Updated last week
- [ACM Multimedia 2025] "Multi-Agent System for Comprehensive Soccer Understanding"☆71Oct 31, 2025Updated 4 months ago
- A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning☆36Mar 12, 2026Updated last week
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Jan 31, 2024Updated 2 years ago
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated 3 months ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆16Oct 25, 2024Updated last year
- 合肥工业大学计科硬件综合设计简易版-单周期MIPS CPU☆37Feb 16, 2023Updated 3 years ago
- EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search☆25Apr 10, 2019Updated 6 years ago
- A Multi-Agent Approach Integrating Socratic Guidance for Automated Prompt Optimization☆17Dec 15, 2025Updated 3 months ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆19Feb 14, 2025Updated last year
- [ECCV 2024] Official code repository of paper titled "Efficient 3D-Aware Facial Image Editing Via Attribute-Specific Prompt Learning"☆10Aug 2, 2024Updated last year
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆19Nov 4, 2025Updated 4 months ago
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Jul 22, 2024Updated last year
- [ICLR 2026] Official Implementation of ProxyThinker: Test-Time Guidance through Small Visual Reasoners.☆20Sep 24, 2025Updated 6 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [arXiv'25] LiCoMemory: Lightweight and Cognitive Agentic Memory for Efficient Long-Term Reasoning☆33Jan 6, 2026Updated 2 months ago
- The source code of Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents.☆37Jan 31, 2026Updated last month
- Multigranularity Contrastive cross-modal collaborative Generation (MCG) model for Video QA☆11Dec 13, 2023Updated 2 years ago
- [ICLR 2026] Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks☆30Feb 5, 2026Updated last month
- collab-dev - Collaboration Metrics for Code Reviews☆23May 12, 2025Updated 10 months ago
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆45Mar 16, 2026Updated last week
- Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Finding]"☆16Aug 27, 2025Updated 6 months ago