Heven-Pan / UFVideoLinks
UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
β35Updated last month
Alternatives and similar repositories for UFVideo
Users that are interested in UFVideo are comparing it to the libraries listed below
Sorting:
- [Awesome] π₯π₯π₯ Latest Papers, Codes and Datasets on Streaming / Online Video Understandingβ85Updated 3 weeks ago
- [SIGIR'2024 Best Paper Honorable Mention] Official repository for "LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composeβ¦β72Updated 10 months ago
- [NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportationβ115Updated last year
- π₯ OneThinker: All-in-one Reasoning Model for Image and Videoβ394Updated this week
- A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.β565Updated last month
- (ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.β77Updated 7 months ago
- [AAAI 2026] β¨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understandingβ117Updated 3 months ago
- π₯[NeurIPS 2024] Official Implementation of Hawk: Learning to Understand Open-World Video Anomaliesβ224Updated 9 months ago
- [CVPR 2024] Official implementation of "Universal Segmentation at Arbitrary Granularity with Language Instruction"β284Updated last year
- High Quality Video Reasoning Segmentationβ144Updated 2 months ago
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Modelsβ123Updated last year
- Official Repository of OmniCaptionerβ168Updated 9 months ago
- [LLaVA-Video-R1]β¨First Adaptation of R1 to LLaVA-Video (2025-03-18)β68Updated 9 months ago
- [Neurocomputing] Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentationβ22Updated last month
- [ACM MM'2024] Official repository for "Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval"β42Updated last year
- [ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++β216Updated last week
- [Pattern Recognition 2025] Cross-Modal Adapter for Vision-Language Retrievalβ140Updated 5 months ago
- **Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.β351Updated 3 months ago
- [NeurIPS 2025 Spotlight] Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoningβ40Updated 3 weeks ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Betterβ186Updated last week
- [NeurIPS 2025] Efficient Reasoning Vision Language Modelsβ449Updated 4 months ago
- First Video Deep Research Benchmarkβ140Updated 3 weeks ago
- (ICCV-2025 Official Code)) Improving Generalist Model with Domain-Specific Expertsβ87Updated 3 months ago
- [ACM CSUR 2025] Out-of-Distribution Detection: A Task-Oriented Survey of Recent Advancesβ163Updated last month
- [ICCV 2023] Spectrum-guided Multi-granularity Referring Video Object Segmentation.β110Updated 10 months ago
- Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"β137Updated 3 months ago
- CoS: Chain-of-Shot Prompting for Long Video Understandingβ53Updated 11 months ago
- A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.β708Updated 3 weeks ago
- β128Updated 4 months ago
- π₯ [AAAI 2026 Oral] Official code for Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptatβ¦β75Updated last year