hwanyu112 / VIBE-BenchmarkLinks
☆20Updated last week
Alternatives and similar repositories for VIBE-Benchmark
Users that are interested in VIBE-Benchmark are comparing it to the libraries listed below
Sorting:
- A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…☆128Updated last month
- Official implementation of MC-LLaVA.☆140Updated 3 months ago
- This is a collection of recent papers on reasoning in video generation models.☆95Updated last month
- [ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval☆99Updated 3 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆103Updated 7 months ago
- UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation☆123Updated last month
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆80Updated 3 months ago
- TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs☆101Updated last week
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆349Updated last month
- [NeurIPS 2025] Deep Memory Backtracking for Long Video Understanding☆64Updated 3 months ago
- Official codebase for the paper Latent Visual Reasoning☆109Updated 3 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆64Updated 6 months ago
- LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling☆187Updated 2 weeks ago
- ☆16Updated last month
- [ICCV 2025] FonTS: Text Rendering with Typography and Style Controls☆36Updated 3 months ago
- EgoToM is an egocentric theory-of-mind benchmark built on Ego4D videos, containing multi-choice questions that evaluate multimodal large …☆13Updated 10 months ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆113Updated 2 months ago
- [ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potenti…☆361Updated last week
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆113Updated 2 months ago
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆58Updated 2 weeks ago
- Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"☆90Updated 5 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆240Updated 6 months ago
- The first HEVC style Vision Transformer with advanced multimodal capabilities☆83Updated this week
- [NeurIPS 2025] The official PyTorch implementation of the "Vision Function Layer in MLLM".☆27Updated last month
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆109Updated last month
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆175Updated last month
- Awesome latest models, datasets and benchmarks on streaming/online video understanding.☆23Updated 3 months ago
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆46Updated 2 months ago
- Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward☆60Updated 2 months ago
- code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"☆21Updated 2 months ago