Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation, ECCV 2024
☆22Feb 15, 2024Updated 2 years ago
Alternatives and similar repositories for idea2img
Users that are interested in idea2img are comparing it to the libraries listed below
Sorting:
- ☆11Oct 22, 2023Updated 2 years ago
- The official implementation of the paper "CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis"☆16Sep 2, 2024Updated last year
- CoV: Chain-of-View Prompting for Spatial Reasoning☆51Jan 23, 2026Updated last month
- ☆28Sep 2, 2025Updated 6 months ago
- [AAAI 2025]This repo contains evaluation code for the paper “UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in…☆35Apr 10, 2025Updated 10 months ago
- Cross-View Geolocalization and Disaster Mapping with Street-View and VHR Satellite Imagery: A Case Study of Hurricane IAN☆17Oct 3, 2024Updated last year
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆114Dec 24, 2025Updated 2 months ago
- Controllable mage captioning model with unsupervised modes☆21Apr 14, 2023Updated 2 years ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆51Feb 22, 2026Updated 2 weeks ago
- Image captioning with weight pruning in PyTorch☆22Jan 14, 2022Updated 4 years ago
- The official pytorch implementation of Exploring the Interactive Guidance for Unified and Effective Image Matting [TOMM 2025]☆24Nov 24, 2025Updated 3 months ago
- Official Repository for Deterministic Neural Illuminant Mapping (DeNIM) published in ICCV2023 Workshops☆23Aug 7, 2023Updated 2 years ago
- [ICCV25 Highlight] The official implementation of the paper "LEGION: Learning to Ground and Explain for Synthetic Image Detection"☆73Oct 22, 2025Updated 4 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Nov 10, 2024Updated last year
- ☆17Aug 18, 2022Updated 3 years ago
- GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities☆305May 3, 2025Updated 10 months ago
- A simple script to see how my ideas evolve over time☆44Jun 4, 2025Updated 9 months ago
- ☆47Jan 26, 2026Updated last month
- [ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".☆25May 26, 2025Updated 9 months ago
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆35Mar 12, 2024Updated last year
- ☆105Feb 4, 2026Updated last month
- ☆39Dec 4, 2023Updated 2 years ago
- [ICLR 2025 Spotlight] The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multi…☆175Feb 7, 2026Updated last month
- ☆38Aug 14, 2024Updated last year
- [ICCV 2025] Where am I? Cross-View Geo-localization with Natural Language Descriptions.☆63Dec 9, 2025Updated 3 months ago
- Code for our EMNLP 2022 paper: Generative Entity Typing with Curriculum Learning.☆13Aug 19, 2023Updated 2 years ago
- ☆18Aug 1, 2025Updated 7 months ago
- Implementation of "Learning Multiscale Convolutional Dictionaries for Image Reconstruction", IEEE Transaction On Computational Imaging, 2…☆32Apr 17, 2023Updated 2 years ago
- Accepted by TMM 2021☆41Jun 30, 2022Updated 3 years ago
- Implementations of the renormalization group-based diffusion model (RGDM).☆16Mar 10, 2025Updated 11 months ago
- Adds a node that generates a mostly consistent comic using LLM output☆12Sep 9, 2025Updated 6 months ago
- Code for "Sample-efficient Deep Reinforcement Learning of Mobile Manipulation for 6-DOF Trajectory Following"☆13Mar 19, 2025Updated 11 months ago
- Kohya's GUI docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experie…☆17Nov 15, 2024Updated last year
- 3D Editing via Propagation of Image Prompts to Multi-View☆18Nov 30, 2025Updated 3 months ago
- A framework for steering MoE models by detecting and controlling behavior-linked experts.☆29Sep 12, 2025Updated 5 months ago
- Federated Meta-Learning for Emotion and Sentiment Aware Multi-modal Complaint Identification☆10May 30, 2024Updated last year
- Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection☆55Aug 16, 2025Updated 6 months ago
- Gesture Recognition Based on ALTERA DE2-115 FPGA☆10Mar 18, 2014Updated 11 years ago
- 用Kinect2.0读取图像的深度等信息,分割出手部图像。用HOG提取手部图像信息,接着用SVM进行训练。目的是为了识别手势。☆10Jan 8, 2020Updated 6 years ago