google-deepmind / proactive_t2i_agentsLinks
Code release for the paper, "Proactive Agents for Text-to-Image Generation under Uncertainty"
☆49Updated last month
Alternatives and similar repositories for proactive_t2i_agents
Users that are interested in proactive_t2i_agents are comparing it to the libraries listed below
Sorting:
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Updated 8 months ago
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆97Updated 3 weeks ago
- ☆87Updated 3 months ago
- ☆56Updated 9 months ago
- The open-source code of MetaStone-S1.☆107Updated last month
- ☆95Updated 2 months ago
- Official PyTorch implementation of TokenSet.☆121Updated 5 months ago
- ☆62Updated 6 months ago
- ☆35Updated 2 years ago
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆157Updated 6 months ago
- Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆104Updated 2 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆107Updated last month
- Implementation of the premier Text to Video model from OpenAI☆56Updated 9 months ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆46Updated 6 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆89Updated 10 months ago
- Multi-Granularity LLM Debugger☆89Updated last month
- ☆91Updated last month
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆44Updated 2 months ago
- ☆213Updated 6 months ago
- ☆86Updated 11 months ago
- ☆63Updated 11 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 6 months ago
- Code for ExploreTom☆86Updated 2 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆81Updated 2 months ago
- Enhancement in Multimodal Representation Learning.☆40Updated last year
- Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D…☆37Updated 6 months ago
- Video-LlaVA fine-tune for CinePile evaluation☆51Updated last year
- ControlLLM: Augment Language Models with Tools by Searching on Graphs☆193Updated last year
- ZeroGUI: Automating Online GUI Learning at Zero Human Cost☆84Updated last month
- ☆84Updated 11 months ago