zyang-ur / idea2imgView external linksLinks
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation, ECCV 2024
☆22Feb 15, 2024Updated 2 years ago
Alternatives and similar repositories for idea2img
Users that are interested in idea2img are comparing it to the libraries listed below
Sorting:
- CoV: Chain-of-View Prompting for Spatial Reasoning☆50Jan 23, 2026Updated 3 weeks ago
- The official implementation of the paper "CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis"☆16Sep 2, 2024Updated last year
- [AAAI 2025]This repo contains evaluation code for the paper “UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in…☆36Apr 10, 2025Updated 10 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆114Dec 24, 2025Updated last month
- [NeurIPS 2025 Oral] Official Code for Exploring Diffusion Transformer Designs via Grafting☆70Jan 9, 2026Updated last month
- Controllable mage captioning model with unsupervised modes☆21Apr 14, 2023Updated 2 years ago
- This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.☆46Aug 22, 2025Updated 5 months ago
- Image captioning with weight pruning in PyTorch☆22Jan 14, 2022Updated 4 years ago
- The official pytorch implementation of Exploring the Interactive Guidance for Unified and Effective Image Matting [TOMM 2025]☆24Nov 24, 2025Updated 2 months ago
- 微信公众号:机器感知 | Tracking the Latest Layer Diffusion Trending☆20Dec 1, 2024Updated last year
- Official Repository for Deterministic Neural Illuminant Mapping (DeNIM) published in ICCV2023 Workshops☆23Aug 7, 2023Updated 2 years ago
- [ICCV25 Highlight] The official implementation of the paper "LEGION: Learning to Ground and Explain for Synthetic Image Detection"☆74Oct 22, 2025Updated 3 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Nov 10, 2024Updated last year
- MADAv2: Advanced Multi-Anchor Based Active Domain Adaptation Segmentation☆25Jul 8, 2023Updated 2 years ago
- ☆17Aug 18, 2022Updated 3 years ago
- OneBEV: Using One Panoramic Image for Bird’s-Eye-View Semantic Mapping☆32Jan 10, 2025Updated last year
- [ICCV 2025] The official implementation of the paper “Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm”☆82Oct 17, 2025Updated 3 months ago
- GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities☆305May 3, 2025Updated 9 months ago
- A simple script to see how my ideas evolve over time☆44Jun 4, 2025Updated 8 months ago
- [ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".☆26May 26, 2025Updated 8 months ago
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆35Mar 12, 2024Updated last year
- ☆47Jan 26, 2026Updated 3 weeks ago
- [ICLR 2025 Spotlight] The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multi…☆175Feb 7, 2026Updated last week
- ☆37Aug 14, 2024Updated last year
- [ICCV 2025] Where am I? Cross-View Geo-localization with Natural Language Descriptions.☆60Dec 9, 2025Updated 2 months ago
- ROS packages for control of an autonomous Renault Twizy at the Department of Electrical Engineering, Chalmers University of Technology, S…☆11May 30, 2021Updated 4 years ago
- ☆17Aug 1, 2025Updated 6 months ago
- Code for our EMNLP 2022 paper: Generative Entity Typing with Curriculum Learning.☆13Aug 19, 2023Updated 2 years ago
- GPT-4V(ision) as A Social Media Analysis Engine☆38Dec 20, 2024Updated last year
- [ICLR 2026] The official implementation of the paper “Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents”☆95Feb 1, 2026Updated 2 weeks ago
- Implementation of "Learning Multiscale Convolutional Dictionaries for Image Reconstruction", IEEE Transaction On Computational Imaging, 2…☆32Apr 17, 2023Updated 2 years ago
- Accepted by TMM 2021☆42Jun 30, 2022Updated 3 years ago
- Kohya's GUI docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experie…☆18Nov 15, 2024Updated last year
- Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection☆55Aug 16, 2025Updated 6 months ago
- Implementation for the paper "Unified Multimodal Model with Unlikelihood Training for Visual Dialog"☆13May 12, 2023Updated 2 years ago
- A framework for steering MoE models by detecting and controlling behavior-linked experts.☆29Sep 12, 2025Updated 5 months ago
- Federated Meta-Learning for Emotion and Sentiment Aware Multi-modal Complaint Identification☆10May 30, 2024Updated last year
- ☆17Aug 5, 2025Updated 6 months ago
- Implementations of the renormalization group-based diffusion model (RGDM).☆16Mar 10, 2025Updated 11 months ago