[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆48Feb 27, 2025Updated last year
Alternatives and similar repositories for MMInA
Users that are interested in MMInA are comparing it to the libraries listed below
Sorting:
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆64Oct 19, 2024Updated last year
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆105Nov 9, 2023Updated 2 years ago
- ☆25Sep 5, 2025Updated 5 months ago
- ☆21Feb 13, 2026Updated 2 weeks ago
- ☆14Oct 16, 2023Updated 2 years ago
- Web-grounded natural language instructions☆18Nov 25, 2024Updated last year
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- This repo contains self made projects and learnables from various resources on using local LLMs and RAG☆14May 26, 2025Updated 9 months ago
- Computer-Use Agents as Judges for Generative UI☆43Nov 27, 2025Updated 3 months ago
- ☆15Mar 12, 2024Updated last year
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆20May 22, 2025Updated 9 months ago
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆160Feb 11, 2025Updated last year
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- ☆16Apr 7, 2024Updated last year
- ☆16Apr 23, 2024Updated last year
- Description and applications of OpenAI's paper about DALL-E (2021) and implementation of other (CLIP-guided) zero-shot text-to-image gene…☆33Aug 11, 2022Updated 3 years ago
- GPT-4V in Wonderland: LMMs as Smartphone Agents☆135Jul 17, 2024Updated last year
- Thubail maker/ image editor using PHP☆19Aug 11, 2021Updated 4 years ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆148Nov 26, 2024Updated last year
- ☆58Apr 24, 2024Updated last year
- [CVPR 2024] 🏡Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning☆81Apr 5, 2024Updated last year
- Some preliminary explorations of Mamba's context scaling.☆218Feb 8, 2024Updated 2 years ago
- [TMLR 2025] The official repository of the paper "Unsupervised Discovery of Object-Centric Neural Fields"☆18Feb 15, 2026Updated 2 weeks ago
- Submission to the inverse scaling prize☆23Jul 23, 2023Updated 2 years ago
- ☆18May 29, 2024Updated last year
- VisualWebArena is a benchmark for multimodal agents.☆436Nov 9, 2024Updated last year
- ☆19Jul 11, 2024Updated last year
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆136Jul 17, 2024Updated last year
- ☆38Jan 19, 2026Updated last month
- ☆12Sep 26, 2023Updated 2 years ago
- ☆20Apr 26, 2024Updated last year
- Almost isotropic remeshing☆26Sep 18, 2023Updated 2 years ago
- ☆19Apr 23, 2025Updated 10 months ago
- BH hackathon☆14Apr 4, 2024Updated last year
- ☆11Feb 9, 2024Updated 2 years ago
- Codebase for the paper HawkI: HawkI: Homography & Mutual Information Guidance for 3D-free Single Image to Aerial View☆13Jun 5, 2024Updated last year
- Frontend (and soon also midleware and backend) for a new, opensource image generation platform.☆14Nov 5, 2022Updated 3 years ago
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Jul 25, 2023Updated 2 years ago
- ☆29Updated this week