64327069 / LVAgentView external linksLinks
Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
☆28Nov 24, 2025Updated 2 months ago
Alternatives and similar repositories for LVAgent
Users that are interested in LVAgent are comparing it to the libraries listed below
Sorting:
- Agentic Keyframe Search for Video Question Answering☆15Apr 7, 2025Updated 10 months ago
- [NeurIPS'25] ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding☆47Sep 21, 2025Updated 4 months ago
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆12Jun 11, 2024Updated last year
- WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning☆49Dec 30, 2025Updated last month
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆16May 21, 2024Updated last year
- ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation (CVPR'25)☆18Apr 2, 2025Updated 10 months ago
- ☆23Aug 20, 2024Updated last year
- [MICCAI 2024] VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks☆27Jan 13, 2026Updated last month
- [AAAI 2026] ✨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding☆117Nov 12, 2025Updated 3 months ago
- Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"☆31Apr 20, 2025Updated 9 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models☆49Jul 7, 2025Updated 7 months ago
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆38Jan 27, 2026Updated 2 weeks ago
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))☆56Jun 9, 2025Updated 8 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 2 months ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆45Jul 1, 2025Updated 7 months ago
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 10 months ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆113Dec 12, 2025Updated 2 months ago
- ☆36Jul 9, 2025Updated 7 months ago
- [EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆61Feb 4, 2026Updated last week
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- ☆12Jun 19, 2024Updated last year
- [NeurIPS 2025] Official Implementation of paper "Sherlock: Self-Correcting Reasoning in Vision-Language Models"☆28Sep 18, 2025Updated 4 months ago
- Implementation of various handwritten text line segmentation☆10Jan 6, 2020Updated 6 years ago
- ☆13Jul 3, 2024Updated last year
- (ICCV 2021) Official PyTorch implementation of "Learning to Discover Reflection Symmetry via Polar Matching Convolution."☆13Aug 31, 2021Updated 4 years ago
- ☆15Nov 27, 2025Updated 2 months ago
- Whole Heart MRI Segmenter based on data from HVSMR MICCAI 2016 Challenge☆11Apr 25, 2020Updated 5 years ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification☆49Mar 24, 2025Updated 10 months ago
- [NeurIPS 2024] HiCoM: Hierarchical Coherent Motion for Dynamic Streamable Scenes with 3D Gaussian Splatting☆44Dec 24, 2024Updated last year
- ECCV24 "ReMamber: Referring Image Segmentation with Mamba Twister" official repository.☆44Jul 11, 2024Updated last year
- Research works from Tencent AI Lab regarding self-evolving agents☆82Jan 30, 2026Updated 2 weeks ago
- ☆11Sep 27, 2023Updated 2 years ago
- [NeurIPS 2024 poster] Cross-model Control: Improving Multiple Large Language Models in One-time Training☆14Oct 25, 2024Updated last year
- Progressive Language-guided Visual Learning for Multi-Task Visual Grounding☆13May 9, 2025Updated 9 months ago
- ☆24Nov 27, 2025Updated 2 months ago