shulin16/MMInA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shulin16/MMInA)

shulin16 / MMInA

[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents

☆54

Alternatives and similar repositories for MMInA

Users that are interested in MMInA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

VisualWebBench / VisualWebBench
View on GitHub
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆68Oct 19, 2024Updated last year
pufanyi / syphus
View on GitHub
Syphus: Automatic Instruction-Response Generation Pipeline
☆14Dec 14, 2023Updated 2 years ago
synvo-ai / HippoCamp
View on GitHub
A benchmark for evaluating contextual agents on realistic multimodal personal-computer environments with profiling and factual-retention …
☆29Apr 2, 2026Updated 3 months ago
OSU-NLP-Group / Mind2Web-2
View on GitHub
[NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge
☆111May 17, 2026Updated 2 months ago
cliangyu / Cola
View on GitHub
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆106Nov 9, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
penghao-wu / ProxyV
View on GitHub
[ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
☆20May 22, 2025Updated last year
Luodian / GenBench
View on GitHub
Benchmarking and Analyzing Generative Data for Visual Recognition
☆26Jul 25, 2023Updated 2 years ago
JHU-CLSP / turking-bench
View on GitHub
Web-grounded natural language instructions
☆18Nov 25, 2024Updated last year
magicgh / Self-MAP
View on GitHub
[ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents
☆16Oct 12, 2024Updated last year
McGill-NLP / weblinx
View on GitHub
WebLINX is a benchmark for building web navigation agents with conversational capabilities
☆162Feb 11, 2025Updated last year
Vchitect / Uni-MMMU
View on GitHub
[ACL2026 oral] Uni-MMMU : A Massive Multi-discipline Multimodal Unified Benchmark
☆25Apr 13, 2026Updated 3 months ago
V3Det / mmdetection-V3Det
View on GitHub
OpenMMLab Detection Toolbox and Benchmark for V3Det
☆15Apr 3, 2024Updated 2 years ago
web-arena-x / visualwebarena
View on GitHub
VisualWebArena is a benchmark for multimodal agents.
☆482Nov 9, 2024Updated last year
RUCBM / GUICourse
View on GitHub
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
☆143Mar 1, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
synvo-ai / local-cocoa
View on GitHub
A local AI assistant running on your device. It turns your files into actionable memory.
☆55Mar 24, 2026Updated 3 months ago
OSU-NLP-Group / Explorer
View on GitHub
[ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
☆29Feb 17, 2026Updated 5 months ago
peterljq / Tutorial-of-Data-Distillation-and-Condensation
View on GitHub
A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …
☆13Dec 1, 2022Updated 3 years ago
Berkeley-NLP / Agent-Eval-Refine
View on GitHub
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆149Nov 26, 2024Updated last year
camenduru / bria-rmbg-jupyter
View on GitHub
☆14Mar 12, 2024Updated 2 years ago
DTennant / distill_visual_priors
View on GitHub
2nd place solution of ECCV 2020 workshop VIPriors Image Classification Challenge, https://arxiv.org/abs/2008.00261
☆13Aug 22, 2021Updated 4 years ago
jzhang38 / LongMamba
View on GitHub
Some preliminary explorations of Mamba's context scaling.
☆220Feb 8, 2024Updated 2 years ago
xmed-lab / HCGNet
View on GitHub
J-BHI 2024: Exploiting Hierarchical Interactions for Protein Surface Learning
☆17Jan 21, 2024Updated 2 years ago
HanSolo9682 / CounterCurate
View on GitHub
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆19Jun 27, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ex3ndr / supervoice-gpt-facodec
View on GitHub
GPT for FACodec
☆13Mar 25, 2024Updated 2 years ago
fsndzomga / open_source_lrm
View on GitHub
☆10Oct 24, 2024Updated last year
GasolSun36 / SURf
View on GitHub
[EMNLP 2024] SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information
☆11Oct 11, 2024Updated last year
camenduru / MusePose-jupyter
View on GitHub
☆19May 29, 2024Updated 2 years ago
StelaBou / Diffusion-Act
View on GitHub
☆25Sep 5, 2025Updated 10 months ago
king159 / svd-mv
View on GitHub
Unofficial Implementation of "Stable Video Diffusion Multi-View"
☆79Apr 15, 2024Updated 2 years ago
reka-ai / research-eval
View on GitHub
A benchmark to evaluate search-augmented LLMs
☆17Aug 28, 2025Updated 10 months ago
Dongqi-Fan / LIR
View on GitHub
This is official repository for "LIR: Efficient Degradation Removal for Lightweight Image Restoration"
☆16Jun 9, 2024Updated 2 years ago
king159 / Pair-Net
View on GitHub
[IEEE TPAMI-2024] Pair then Relation: Pair-Net for Panoptic Scene Graph Generation
☆101Nov 20, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Dongping-Chen / GUI-World
View on GitHub
(ICLR 2025) The Official Code Repository for GUI-World.
☆69Dec 18, 2024Updated last year
camenduru / sliders-colab
View on GitHub
☆32Jan 25, 2024Updated 2 years ago
Ropedia / S-Agent
View on GitHub
S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence
☆73Jun 26, 2026Updated 3 weeks ago
showlab / videogui
View on GitHub
[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
☆53Feb 22, 2026Updated 4 months ago
AtsuMiyai / UPD
View on GitHub
[ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
☆82Mar 6, 2026Updated 4 months ago
OSU-NLP-Group / UGround
View on GitHub
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
☆314Mar 11, 2026Updated 4 months ago
AntonotnaWang / HINT
View on GitHub
[CVPR 2022] HINT: Hierarchical Neuron Concept Explainer
☆20Apr 19, 2023Updated 3 years ago