Heven-Pan / UFVideoLinks
UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
β35Updated last month
Alternatives and similar repositories for UFVideo
Users that are interested in UFVideo are comparing it to the libraries listed below
Sorting:
- [Awesome] π₯π₯π₯ Latest Papers, Codes and Datasets on Streaming / Online Video Understandingβ78Updated 3 weeks ago
- [SIGIR'2024 Best Paper Honorable Mention] Official repository for "LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composeβ¦β71Updated 10 months ago
- [NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportationβ115Updated last year
- Official Repository of OmniCaptionerβ168Updated 9 months ago
- π₯ OneThinker: All-in-one Reasoning Model for Image and Videoβ388Updated 3 weeks ago
- (ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.β77Updated 7 months ago
- High Quality Video Reasoning Segmentationβ144Updated 2 months ago
- A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.β562Updated last month
- [CVPR 2024] Official implementation of "Universal Segmentation at Arbitrary Granularity with Language Instruction"β284Updated last year
- π₯[NeurIPS 2024] Official Implementation of Hawk: Learning to Understand Open-World Video Anomaliesβ224Updated 9 months ago
- [AAAI 2026] β¨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understandingβ112Updated 2 months ago
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Modelsβ122Updated last year
- [Pattern Recognition 2025] Cross-Modal Adapter for Vision-Language Retrievalβ139Updated 5 months ago
- [ACM MM'2024] Official repository for "Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval"β42Updated last year
- [Neurocomputing] Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentationβ22Updated last month
- [LLaVA-Video-R1]β¨First Adaptation of R1 to LLaVA-Video (2025-03-18)β68Updated 8 months ago
- [ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++β214Updated 6 months ago
- Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"β134Updated 3 months ago
- **Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.β346Updated 3 months ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Betterβ186Updated this week
- [ICCV 2023] Spectrum-guided Multi-granularity Referring Video Object Segmentation.β110Updated 9 months ago
- [NeurIPS 2025] Efficient Reasoning Vision Language Modelsβ449Updated 4 months ago
- (ICCV-2025 Official Code)) Improving Generalist Model with Domain-Specific Expertsβ87Updated 3 months ago
- CoS: Chain-of-Shot Prompting for Long Video Understandingβ53Updated 11 months ago
- [ACM CSUR 2025] Out-of-Distribution Detection: A Task-Oriented Survey of Recent Advancesβ162Updated last month
- First Video Deep Research Benchmarkβ134Updated 2 weeks ago
- [NeurIPS 2022] Official code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answeringβ105Updated 9 months ago
- Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Modelβ934Updated last month
- The official repository of SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimizationβ156Updated last week
- [NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTSβ1,238Updated 2 weeks ago