Yu-xm / ReVisionLinks
Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
☆24Updated this week
Alternatives and similar repositories for ReVision
Users that are interested in ReVision are comparing it to the libraries listed below
Sorting:
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models☆33Updated last month
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Updated 2 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Updated 11 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆86Updated 6 months ago
- ☆141Updated 3 months ago
- [ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…☆86Updated last week
- ☆39Updated 8 months ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Updated 8 months ago
- ☆22Updated last month
- T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation☆36Updated 4 months ago
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"☆52Updated last year
- [AAAI 2026] GenMAC for Compositional Text-to-Video Generation☆32Updated 3 weeks ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆64Updated 7 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago
- Official Implementation of OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation☆37Updated 7 months ago
- ☆81Updated 7 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Updated last year
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated last week
- ☆24Updated 8 months ago
- [ICCV 2025] Diffusion Curriculum (DisCL)☆17Updated 4 months ago
- On Path to Multimodal Generalist: General-Level and General-Bench☆19Updated 6 months ago
- [CVPR 2025 AI4CC Workshop] Official Implementation of HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editin…☆35Updated 9 months ago
- ☆63Updated 6 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆74Updated 4 months ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆77Updated last year
- SFT+RL boosts multimodal reasoning☆45Updated 7 months ago
- ☆35Updated 2 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆24Updated last year
- [ICLR 2025, AAAI 2026] official implementation of "Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generati…☆34Updated last week
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆80Updated last month