Yu-xm / UnicornLinks
Text-Only Data Synthesis for Vision Language Model Training
☆22Updated 6 months ago
Alternatives and similar repositories for Unicorn
Users that are interested in Unicorn are comparing it to the libraries listed below
Sorting:
- ☆140Updated 2 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Updated 10 months ago
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Updated last month
- ☆15Updated 7 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆85Updated 5 months ago
- ☆39Updated 7 months ago
- ICML2025☆62Updated 4 months ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆24Updated 6 months ago
- The code repository of UniRL☆47Updated 7 months ago
- A Comprehensive Dataset for Advanced Image Generation and Editing}☆30Updated 3 months ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆44Updated 6 months ago
- ☆20Updated 3 weeks ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆111Updated 2 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆63Updated 6 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆199Updated 2 months ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆52Updated 5 months ago
- ☆56Updated 8 months ago
- GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning☆101Updated 7 months ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆74Updated last year
- Official Implementation of OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation☆37Updated 5 months ago
- ☆63Updated 5 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆73Updated 3 months ago
- FQGAN: Factorized Visual Tokenization and Generation☆57Updated 9 months ago
- Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆72Updated 3 weeks ago
- Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)☆44Updated last month
- This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark perform…☆82Updated 3 months ago
- ICML 2025 - Impossible Videos☆82Updated 5 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 5 months ago
- Official implement of MIA-DPO☆69Updated 11 months ago
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.☆50Updated last year