yongliang-wu / RepurposeLinks
[AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark
☆17Updated 6 months ago
Alternatives and similar repositories for Repurpose
Users that are interested in Repurpose are comparing it to the libraries listed below
Sorting:
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆124Updated last month
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆42Updated 11 months ago
- Accepted by CVPR 2024☆39Updated last year
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆156Updated 7 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆134Updated 2 months ago
- R1-like Video-LLM for Temporal Grounding☆123Updated 4 months ago
- Official repository of NeurIPS D&B Track 2024 paper "VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understan…☆37Updated 9 months ago
- [AAAI 2025] AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video…☆89Updated 10 months ago
- [CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online☆72Updated last month
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆127Updated 2 months ago
- The code for Fine-grained HBOE | AAAI 2024 (official version and optimized version).☆16Updated last year
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆228Updated 2 months ago
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆292Updated 9 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆125Updated 3 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆139Updated 10 months ago
- Official code for MotionBench (CVPR 2025)☆59Updated 8 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆324Updated 3 weeks ago
- A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…☆123Updated last month
- [NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning☆220Updated 3 weeks ago
- Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding☆52Updated 2 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆107Updated 5 months ago
- Use 2 lines to empower absolute time awareness for Qwen2.5VL's MRoPE☆26Updated last month
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆288Updated 6 months ago
- [ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".☆20Updated last week
- UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation☆109Updated 2 weeks ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆71Updated last month
- Collections of Papers and Projects for Multimodal Reasoning.☆104Updated 6 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆159Updated last month
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆113Updated 2 weeks ago
- Official implementation of MC-LLaVA.☆140Updated 2 months ago