π§ VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)
β311Feb 8, 2026Updated last month
Alternatives and similar repositories for VideoMind
Users that are interested in VideoMind are comparing it to the libraries listed below
Sorting:
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoningβ143Aug 21, 2025Updated 7 months ago
- πΎ E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)β74Jan 20, 2025Updated last year
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Abilityβ106Nov 28, 2024Updated last year
- [CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMsβ114Mar 12, 2026Updated last week
- Video-R1: Reinforcing Video Reasoning in MLLMs [π₯the first paper to explore R1 for video]β837Dec 14, 2025Updated 3 months ago
- R1-like Video-LLM for Temporal Groundingβ135Jun 20, 2025Updated 9 months ago
- π R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)β90Jul 2, 2024Updated last year
- π₯π₯First-ever hour scale video understanding modelsβ616Jul 14, 2025Updated 8 months ago
- Frontier Multimodal Foundation Models for Image and Video Understandingβ1,128Aug 14, 2025Updated 7 months ago
- paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding (TVG), Video Grounding (VG), or Temporal Sentence Grounding in Viβ¦β36Dec 27, 2025Updated 2 months ago
- TStar is a unified temporal search framework for long-form video question answeringβ93Sep 2, 2025Updated 6 months ago
- Official PyTorch code of GroundVQA (CVPR'24)β64Sep 13, 2024Updated last year
- Repo for paper "MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding".β39Jun 9, 2025Updated 9 months ago
- [ICML 2025] Official PyTorch implementation of LongVUβ424May 8, 2025Updated 10 months ago
- Pytorch Implementation of ECCV'22 paper: Video Activity Localisation with Uncertainties in Temporal Boundaryβ17Jul 17, 2022Updated 3 years ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ137Jul 28, 2025Updated 7 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videosβ146Dec 26, 2024Updated last year
- The code for PixelRefer & VideoReferβ345Nov 16, 2025Updated 4 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoningβ115Dec 24, 2025Updated 2 months ago
- [ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modelingβ511Nov 18, 2025Updated 4 months ago
- β194Oct 14, 2024Updated last year
- [AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal Pβ¦β65Jan 27, 2026Updated last month
- β17Jan 26, 2025Updated last year
- LLaVA-Next for STVGβ18Dec 5, 2025Updated 3 months ago
- [EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Modelsβ140Aug 21, 2025Updated 7 months ago
- FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding. (WACV2025)β35Apr 17, 2025Updated 11 months ago
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridgesβ84Feb 27, 2025Updated last year
- Official PyTorch implementation of the paper "Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs"β91Jun 6, 2025Updated 9 months ago
- β49Sep 13, 2024Updated last year
- Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"β180Feb 25, 2025Updated last year
- Pytorch implementation of the paper 'Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Superβ¦β20Jan 19, 2024Updated 2 years ago
- β41Sep 9, 2025Updated 6 months ago
- This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"β274Oct 15, 2025Updated 5 months ago
- β44Jul 9, 2025Updated 8 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequencesβ43Mar 11, 2025Updated last year
- [CVPR2024] The official implementation of AdaTAD: End-to-End Temporal Action Detection with 1B Parameters Across 1000 Framesβ40Jul 9, 2024Updated last year
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioningβ1,469Jun 26, 2025Updated 8 months ago
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Groundingβ82Dec 14, 2025Updated 3 months ago
- β10Feb 14, 2025Updated last year