ZiyuGuo99 / Image-Generation-CoT
[CVPR 2025] The First Investigation of CoT Reasoning in Image Generation
☆635Updated 3 weeks ago
Alternatives and similar repositories for Image-Generation-CoT:
Users that are interested in Image-Generation-CoT are comparing it to the libraries listed below
- Liquid: Language Models are Scalable and Unified Multi-modal Generators☆517Updated 2 weeks ago
- a family of versatile and state-of-the-art video tokenizers.☆382Updated 2 weeks ago
- This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral☆394Updated 8 months ago
- [ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution☆302Updated last month
- [ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.☆1,350Updated 3 weeks ago
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.☆294Updated 9 months ago
- Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement 🔥☆564Updated 3 months ago
- [NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions☆1,053Updated 6 months ago
- Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models☆910Updated last month
- ☆100Updated last month
- [ICLR 2025] Autoregressive Video Generation without Vector Quantization☆477Updated this week
- GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities☆240Updated 2 weeks ago
- Evaluating text-to-image/video/3D models with VQAScore☆284Updated last month
- ☆126Updated 2 weeks ago
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆562Updated 10 months ago
- Official implementation for "Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model" (NeurIPS 2024)☆252Updated 6 months ago
- [CVPR 2025 Highlight🔥] Identity-Preserving Text-to-Video Generation by Frequency Decomposition☆674Updated last week
- 🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos☆1,060Updated last week
- [ICML 2023 Oral, NeurIPS 2023] Official implementations for paper: Customizable Image Synthesis with Multiple Subjects☆434Updated last year
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆316Updated last month
- Ola: Pushing the Frontiers of Omni-Modal Language Model☆334Updated last month
- [CVPR'25]Tora: Trajectory-oriented Diffusion Transformer for Video Generation☆1,129Updated last month
- Official implementation for "RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers"☆629Updated last month
- An official implementation of VideoRoPE: What Makes for Good Video Rotary Position Embedding?☆127Updated 2 weeks ago
- [NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation☆199Updated last week
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"☆240Updated this week
- [TMLR 2025🔥] A survey for the autoregressive models in vision.☆508Updated this week
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"☆230Updated last month
- [ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation☆290Updated 9 months ago
- Visualization of DiT self attention features☆198Updated 8 months ago