Heven-Pan / UFVideoLinks
UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
β35Updated last month
Alternatives and similar repositories for UFVideo
Users that are interested in UFVideo are comparing it to the libraries listed below
Sorting:
- [Awesome] π₯π₯π₯ Latest Papers, Codes and Datasets on Streaming / Online Video Understandingβ85Updated last month
- [SIGIR'2024 Best Paper Honorable Mention] Official repository for "LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composeβ¦β72Updated 10 months ago
- π₯ OneThinker: All-in-one Reasoning Model for Image and Videoβ394Updated last week
- [NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportationβ115Updated last year
- High Quality Video Reasoning Segmentationβ144Updated 2 months ago
- [CVPR 2024] Official implementation of "Universal Segmentation at Arbitrary Granularity with Language Instruction"β284Updated last year
- A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.β565Updated last month
- [AAAI 2026] β¨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understandingβ117Updated 3 months ago
- π₯[NeurIPS 2024] Official Implementation of Hawk: Learning to Understand Open-World Video Anomaliesβ224Updated 9 months ago
- (ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.β77Updated 7 months ago
- [LLaVA-Video-R1]β¨First Adaptation of R1 to LLaVA-Video (2025-03-18)β68Updated 9 months ago
- [ACM MM'2024] Official repository for "Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval"β43Updated last year
- Official Repository of OmniCaptionerβ169Updated 9 months ago
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Modelsβ123Updated last year
- [Pattern Recognition 2025] Cross-Modal Adapter for Vision-Language Retrievalβ140Updated 5 months ago
- **Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.β351Updated 3 months ago
- [ICCV 2023] Spectrum-guided Multi-granularity Referring Video Object Segmentation.β110Updated 10 months ago
- [Neurocomputing] Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentationβ22Updated last month
- [ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++β216Updated last week
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Betterβ186Updated last week
- [NeurIPS 2025] Efficient Reasoning Vision Language Modelsβ448Updated 4 months ago
- [NeurIPS 2025 Spotlight] Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoningβ40Updated 3 weeks ago
- [ACM CSUR 2025] Out-of-Distribution Detection: A Task-Oriented Survey of Recent Advancesβ163Updated last month
- Explain Before You Answer: A Survey on Compositional Visual Reasoningβ307Updated 3 months ago
- β128Updated 4 months ago
- CoS: Chain-of-Shot Prompting for Long Video Understandingβ53Updated last year
- [NeurIPS 2022] Official code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answeringβ105Updated 10 months ago
- [NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTSβ1,240Updated 3 weeks ago
- (ICCV-2025 Official Code)) Improving Generalist Model with Domain-Specific Expertsβ87Updated 3 months ago
- First Video Deep Research Benchmarkβ140Updated 3 weeks ago