NROwind / OpenGPT-4o-ImageLinks
A Comprehensive Dataset for Advanced Image Generation and Editing}
☆29Updated last month
Alternatives and similar repositories for OpenGPT-4o-Image
Users that are interested in OpenGPT-4o-Image are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated last year
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders☆37Updated 5 months ago
- Test-time Scaling for VAR models☆25Updated 2 months ago
- ☆39Updated 6 months ago
- ☆30Updated 11 months ago
- ☆12Updated 9 months ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆19Updated 9 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆60Updated 4 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆167Updated last week
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆77Updated 4 months ago
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆28Updated this week
- ☆132Updated last month
- ☆78Updated 4 months ago
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"☆50Updated 11 months ago
- Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)☆43Updated last month
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆48Updated 7 months ago
- The code repository of UniRL☆46Updated 5 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆31Updated 3 months ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 6 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆101Updated 2 weeks ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆39Updated 9 months ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆57Updated 4 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆25Updated last year
- ☆62Updated 4 months ago
- Text-Only Data Synthesis for Vision Language Model Training☆22Updated 5 months ago
- GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning☆100Updated 5 months ago
- Official respository for ReasonGen-R1☆73Updated 4 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆51Updated 3 months ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆41Updated 7 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆181Updated last month