yongliang-wu / RepurposeLinks

[AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

☆17

Alternatives and similar repositories for Repurpose

Users that are interested in Repurpose are comparing it to the libraries listed below

Sorting:

yongliang-wu / NumPro
[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga
☆124Updated last month
yongliang-wu / ExploreCfg
[NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning
☆42Updated 11 months ago
shiyi-zh0408 / NAE_CVPR2024
Accepted by CVPR 2024
☆39Updated last year
Wang-Xiaodong1899 / CVPR25-MLLM-Paper-List
🔥CVPR 2025 Multimodal Large Language Models Paper List
☆156Updated 7 months ago
gyxxyg / TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆134Updated 2 months ago
www-Ye / Time-R1
R1-like Video-LLM for Temporal Grounding
☆123Updated 4 months ago
hlchen23 / VERIFIED
Official repository of NeurIPS D&B Track 2024 paper "VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understan…
☆37Updated 9 months ago
appletea233 / AL-Ref-SAM2
[AAAI 2025] AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video…
☆89Updated 10 months ago
MCG-NJU / VideoChat-Online
[CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online
☆72Updated last month
ncTimTang / AKS
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆127Updated 2 months ago
Whalesong-zrs / Towards-Fine-grained-HBOE
The code for Fine-grained HBOE | AAAI 2024 (official version and optimized version).
☆16Updated last year
TencentARC / TokLIP
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆228Updated 2 months ago
baaivision / DIVA
[ICLR 2025] Diffusion Feedback Helps CLIP See Better
☆292Updated 9 months ago
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆125Updated 3 months ago
showlab / VideoLISA
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
☆139Updated 10 months ago
zai-org / MotionBench
Official code for MotionBench (CVPR 2025)
☆59Updated 8 months ago
Purshow / Awesome-Unified-Multimodal
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
☆324Updated 3 weeks ago
arctanxarc / UniCTokens
A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…
☆123Updated last month
OpenGVLab / VideoChat-R1
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆220Updated 3 weeks ago
pipixin321 / Awesome-Video-MLLMs
Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding
☆52Updated 2 months ago
ZhangXJ199 / TinyLLaVA-Video-R1
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
☆107Updated 5 months ago
yuanc3 / DATE
Use 2 lines to empower absolute time awareness for Qwen2.5VL's MRoPE
☆26Updated last month
saccharomycetes / mllms_know
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆288Updated 6 months ago
CnFaker / LLaVA-SP
[ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".
☆20Updated last week
CodeGoat24 / UniGenBench
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation
☆109Updated 2 weeks ago
minglllli / CLS-RL
[NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
☆71Updated last month
Video-R1 / Awesome-Multimodal-Reasoning
Collections of Papers and Projects for Multimodal Reasoning.
☆104Updated 6 months ago
PKU-YuanGroup / WISE
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
☆159Updated last month
PhoenixZ810 / RISEBench
[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
☆113Updated 2 weeks ago
arctanxarc / MC-LLaVA
Official implementation of MC-LLaVA.
☆140Updated 2 months ago