Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
☆32Nov 24, 2025Updated 5 months ago
Alternatives and similar repositories for LVAgent
Users that are interested in LVAgent are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆11Jun 11, 2024Updated last year
- Agentic Keyframe Search for Video Question Answering☆18Apr 7, 2025Updated last year
- [NeurIPS 2025] Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM☆24Feb 10, 2026Updated 3 months ago
- [NeurIPS2023] Neural-Logic Human-Object Interaction Detection☆14Aug 24, 2024Updated last year
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆16May 21, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ICLR 2026] MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning☆35Jan 14, 2026Updated 4 months ago
- [NeurIPS'25] ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding☆52Sep 21, 2025Updated 7 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆54Jun 12, 2025Updated 11 months ago
- This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.☆20Dec 1, 2023Updated 2 years ago
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"☆29Apr 16, 2024Updated 2 years ago
- [CVPR'26] UniGame code implementation☆19Apr 21, 2026Updated 3 weeks ago
- ☆22Oct 21, 2024Updated last year
- Entity-Aware and Motion-Aware Transformers for Language-driven Action Localization(IJCAI-22)☆12Oct 11, 2022Updated 3 years ago
- ☆12Jun 19, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆23Aug 20, 2024Updated last year
- Open Set Video HOI detection from Action-centric Chain-of-Look Prompting, ICCV2023☆12Oct 3, 2023Updated 2 years ago
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))☆58Jun 9, 2025Updated 11 months ago
- Official Implementation of SnAG (CVPR 2024)☆59Apr 26, 2025Updated last year
- [NeurIPS 25] The official implementation of SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning☆27Sep 21, 2025Updated 7 months ago
- [AAAI 2026] ✨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding☆125Nov 12, 2025Updated 6 months ago
- [ICCV 2025] VLM4D: Towards Spatiotemporal Awareness in Vision Language Models☆47Nov 20, 2025Updated 5 months ago
- [CVPR 2026 Highlight] WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning☆80Mar 25, 2026Updated last month
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models☆49Jul 7, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆11Jun 27, 2023Updated 2 years ago
- [CVPR 2026] Variation-aware Vision Token Dropping for Faster Large Vision-Language Models☆30Mar 18, 2026Updated 2 months ago
- ☆10Dec 3, 2024Updated last year
- ☆16Sep 11, 2025Updated 8 months ago
- Sparking "Thinking with Videos" via Reinforcement Learning☆157Oct 30, 2025Updated 6 months ago
- [CVPR2026] Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning☆46Mar 27, 2026Updated last month
- ☆54Feb 9, 2026Updated 3 months ago
- ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation (CVPR'25)☆21Apr 2, 2025Updated last year
- The official implementation of Hard Negative Sampling via Large Language Models for Recommendation.☆11Jan 17, 2026Updated 4 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆13Jul 3, 2024Updated last year
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆127Apr 16, 2026Updated last month
- ☆18May 7, 2025Updated last year
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆42Jan 27, 2026Updated 3 months ago
- ☆44Jan 4, 2026Updated 4 months ago
- This repo holds the official code for the paper "FreMIM: Fourier Transform Meets Masked Image Modeling for Medical Image Segmentation".☆24Jan 2, 2024Updated 2 years ago
- [ACL 2026 Main] Revisit What You See: Revealing Visual Semantics in Vision Tokens to Guide LVLM Decoding☆25Nov 21, 2025Updated 5 months ago