Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
☆28Nov 24, 2025Updated 3 months ago
Alternatives and similar repositories for LVAgent
Users that are interested in LVAgent are comparing it to the libraries listed below
Sorting:
- Agentic Keyframe Search for Video Question Answering☆16Apr 7, 2025Updated 11 months ago
- [NeurIPS'25] ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding☆46Sep 21, 2025Updated 5 months ago
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆12Jun 11, 2024Updated last year
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆16May 21, 2024Updated last year
- This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.☆20Dec 1, 2023Updated 2 years ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Jun 12, 2025Updated 8 months ago
- WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning (CVPR 2026)☆58Updated this week
- ☆23Aug 20, 2024Updated last year
- [MICCAI 2024] VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks☆27Jan 13, 2026Updated last month
- [AAAI 2026] ✨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding☆118Nov 12, 2025Updated 3 months ago
- Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"☆31Apr 20, 2025Updated 10 months ago
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models☆49Jul 7, 2025Updated 8 months ago
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆38Jan 27, 2026Updated last month
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))☆56Jun 9, 2025Updated 9 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- official code for unigame☆19Nov 26, 2025Updated 3 months ago
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 3 months ago
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 10 months ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆46Jul 1, 2025Updated 8 months ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆117Dec 12, 2025Updated 2 months ago
- ☆36Jul 9, 2025Updated 7 months ago
- (ICCV 2021) Official PyTorch implementation of "Learning to Discover Reflection Symmetry via Polar Matching Convolution."☆13Aug 31, 2021Updated 4 years ago
- [EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆64Updated this week
- [NeurIPS 2025] Official Implementation of paper "Sherlock: Self-Correcting Reasoning in Vision-Language Models"☆28Sep 18, 2025Updated 5 months ago
- ☆10Apr 7, 2025Updated 11 months ago
- ☆13Jul 3, 2024Updated last year
- Implementation of various handwritten text line segmentation☆10Jan 6, 2020Updated 6 years ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆19Jul 10, 2025Updated 7 months ago
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- ☆12Jun 19, 2024Updated last year
- ☆11Jan 18, 2025Updated last year
- Whole Heart MRI Segmenter based on data from HVSMR MICCAI 2016 Challenge☆11Apr 25, 2020Updated 5 years ago
- ☆15Feb 12, 2026Updated 3 weeks ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification☆49Mar 24, 2025Updated 11 months ago
- ECCV24 "ReMamber: Referring Image Segmentation with Mamba Twister" official repository.☆45Jul 11, 2024Updated last year
- [NeurIPS 2024] HiCoM: Hierarchical Coherent Motion for Dynamic Streamable Scenes with 3D Gaussian Splatting☆45Dec 24, 2024Updated last year
- Research works from Tencent AI Lab regarding self-evolving agents☆83Jan 30, 2026Updated last month