[NeurIPS 2025] Deep Memory Backtracking for Long Video Understanding
☆64Feb 10, 2026Updated last month
Alternatives and similar repositories for VideoLucy
Users that are interested in VideoLucy are comparing it to the libraries listed below
Sorting:
- [NeurIPS2025] ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model☆84Jan 8, 2026Updated 2 months ago
- [NeurIPS2024] Cross-video Identity Correlating for Person Re-identification Pre-training☆101Jun 20, 2025Updated 9 months ago
- Track 2: Social Navigation☆24Aug 19, 2025Updated 7 months ago
- Track 1: Driving with Language☆26Aug 23, 2025Updated 6 months ago
- [ICCV 2025] Perspective-Invariant 3D Object Detection☆169Dec 22, 2025Updated 2 months ago
- [RAL‘26] Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration☆86Jan 11, 2026Updated 2 months ago
- [Technical Report] A Comprehensive Evaluation of Nano Banana Pro on 14 Low-Level Vision Tasks and 40 Datasets☆71Dec 24, 2025Updated 2 months ago
- CVPR2022:Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency☆18Aug 10, 2022Updated 3 years ago
- [ECCV 2024] 4D Contrastive Superflows are Dense 3D Representation Learners☆51Dec 4, 2025Updated 3 months ago
- This is the project for 'USG'.☆37Apr 7, 2025Updated 11 months ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆20Aug 1, 2025Updated 7 months ago
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆41Oct 9, 2025Updated 5 months ago
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆20Dec 14, 2025Updated 3 months ago
- FlexEvent: Event Camera Object Detection at Arbitrary Frequencies☆20Dec 10, 2024Updated last year
- 🌐 A Roadmap for 3D Scene Understanding in the Wild☆26Dec 19, 2025Updated 3 months ago
- [CVPR 2026] WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning☆61Mar 5, 2026Updated 2 weeks ago
- [CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs☆114Mar 12, 2026Updated last week
- official implementation of the paper "Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability".☆51Dec 25, 2025Updated 2 months ago
- ☆13Feb 26, 2024Updated 2 years ago
- From Word to World: Can Large Language Models be Implicit Text-based World Models?☆50Dec 25, 2025Updated 2 months ago
- [CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models☆37Feb 21, 2026Updated last month
- [ICLR 2026] FOCUS: Efficient Keyframe Selection for Long Video Understanding☆53Feb 3, 2026Updated last month
- [CVPR2022] Official Implementation of the paper 'Learning Where to Learn in Cross-View Self-Supervised Learning'☆29Oct 12, 2022Updated 3 years ago
- The official implementation of IEEE-TIP paper “Conditional Boundary Loss for Semantic Segmentation”☆21Nov 20, 2023Updated 2 years ago
- ☆27Feb 12, 2026Updated last month
- This repository provides the PyTorch implementation of the paper: Anomaly Discovery in Semantic Segmentation via Distillation Comparison …☆15Apr 18, 2023Updated 2 years ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- Track 5: Cross-Platform 3D Object Detection☆21Aug 16, 2025Updated 7 months ago
- Official Implementation of Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training☆94Mar 13, 2026Updated last week
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Jan 1, 2026Updated 2 months ago
- 🌐 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future☆338Mar 4, 2026Updated 2 weeks ago
- [ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics☆38Sep 10, 2025Updated 6 months ago
- 2nd place solution of ECCV 2020 workshop VIPriors Image Classification Challenge, https://arxiv.org/abs/2008.00261☆13Aug 22, 2021Updated 4 years ago
- ☆15Aug 12, 2022Updated 3 years ago
- Agentic Keyframe Search for Video Question Answering☆16Apr 7, 2025Updated 11 months ago
- [AAAI 2026 Oral] LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences☆189Dec 12, 2025Updated 3 months ago
- [CVPR2024] Open-Vocabulary Semantic Segmentation with Image Embedding Balancing☆40Jan 12, 2026Updated 2 months ago
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆87Oct 15, 2025Updated 5 months ago
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago