google-deepmind / proactive_t2i_agentsLinks
Code release for the paper, "Proactive Agents for Text-to-Image Generation under Uncertainty"
☆56Updated 3 months ago
Alternatives and similar repositories for proactive_t2i_agents
Users that are interested in proactive_t2i_agents are comparing it to the libraries listed below
Sorting:
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆104Updated 2 months ago
- ☆68Updated last month
- Official PyTorch implementation of TokenSet.☆126Updated 7 months ago
- The open-source code of MetaStone-S1.☆107Updated 2 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆91Updated last year
- Official repo of paper LM2☆47Updated 8 months ago
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆57Updated 4 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆52Updated 10 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆84Updated 4 months ago
- ☆87Updated 5 months ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆46Updated 8 months ago
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆142Updated last year
- ZeroGUI: Automating Online GUI Learning at Zero Human Cost☆96Updated 3 months ago
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆155Updated 8 months ago
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆128Updated 3 months ago
- [ICCV2025] WikiAutoGen offical page☆20Updated 4 months ago
- An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community …☆58Updated last week
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆307Updated 5 months ago
- ☆219Updated 8 months ago
- ☆55Updated 11 months ago
- ControlLLM: Augment Language Models with Tools by Searching on Graphs☆193Updated last year
- Modifying Large Language Models Post-training for Diverse Creative Writing☆51Updated 5 months ago
- Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.☆118Updated last month
- ACL 2025: Synthetic data generation pipelines for text-rich images.☆143Updated 7 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆45Updated 4 months ago
- Inference-time scaling of diffusion-based image and video generation models.☆169Updated 4 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆113Updated 3 months ago
- ☆93Updated 4 months ago
- Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D…☆35Updated 8 months ago
- Implementation of a framework for Genie2 in Pytorch☆153Updated 9 months ago