mbzuai-oryx/Agent-X

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mbzuai-oryx/Agent-X)

mbzuai-oryx / Agent-X

ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks

☆43

Alternatives and similar repositories for Agent-X

Users that are interested in Agent-X are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

umair1221 / WorldCache
View on GitHub
WorldCache: Content-Aware Caching for Accelerated Video World Models
☆21Jun 28, 2026Updated last month
hananshafi / MTL-ViT
View on GitHub
A new multi-task learning framework using Vision Transformers
☆11Jun 19, 2024Updated 2 years ago
umair1221 / AI-in-Agriculture
View on GitHub
A repo for survey paper
☆17Jul 31, 2025Updated 11 months ago
Muhammad-Huzaifaa / ObjectCompose
View on GitHub
[ACCV 2024] ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes 🚀🚀🚀
☆37Jan 21, 2025Updated last year
akhtarvision / weather-regional
View on GitHub
☆11Oct 29, 2024Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
mbzuai-oryx / DriveLMM-o1
View on GitHub
Reasoning DriveLMM
☆15Mar 15, 2025Updated last year
HashmatShadab / Robustness-of-Volumetric-Medical-Segmentation-Models
View on GitHub
[BMVC 2024] On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models
☆15Nov 1, 2024Updated last year
AkashGhosh / CarePilot
View on GitHub
{CVPR Findings 2026] CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare
☆21Apr 4, 2026Updated 3 months ago
mbzuai-oryx / VideoMathQA
View on GitHub
VideoMathQA is a benchmark designed to evaluate mathematical reasoning in real-world educational videos
☆24May 7, 2026Updated 2 months ago
mbzuai-oryx / MIRA
View on GitHub
[ACM MM 2025 🔥🔥 ] MIRA: A first-of-its-kind medical RAG framework that fuses image features and retrieved knowledge with dynamic contex…
☆23Aug 28, 2025Updated 11 months ago
brucewlee / self-incrimination
View on GitHub
Code used for "Training Agents to Self-Report Misbehavior"
☆18Feb 27, 2026Updated 5 months ago
HashmatShadab / HSAT
View on GitHub
[MICCAI 2025] Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology
☆12Jun 17, 2025Updated last year
rohit901 / VANE-Bench
View on GitHub
[NAACL'25] Contains code and documentation for our VANE-Bench paper.
☆24Aug 19, 2025Updated 11 months ago
GAIR-NLP / Med
View on GitHub
[ICML 2026] What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-…
☆23May 15, 2026Updated 2 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
mbzuai-oryx / VideoMolmo
View on GitHub
Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"
☆56Jul 5, 2025Updated last year
SalesforceAIResearch / LATTE
View on GitHub
☆70Jun 2, 2026Updated last month
awaisrauf / agroGPT
View on GitHub
☆39Jan 9, 2025Updated last year
hananshafi / MedContext
View on GitHub
[MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"
☆14Nov 1, 2024Updated last year
mbzuai-oryx / EvoLMM
View on GitHub
Self Evolving Large Multimodal Models with Continuous Rewards
☆25Jun 9, 2026Updated last month
Hasindri / HLSS
View on GitHub
[MICCAI 2024 🔥] HLSS, the first study to explore hierarchical information inherent in histopathology images and their language descripti…
☆27Aug 5, 2024Updated last year
mbzuai-oryx / CVRR-Evaluation-Suite
View on GitHub
[CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…
☆50Aug 23, 2024Updated last year
rhfeiyang / slurm-tutorial
View on GitHub
SLURM Tutorial
☆29May 29, 2025Updated last year
ShaneXiangH / AIGVE_Tool
View on GitHub
a Video Quality Analysis Toolkit
☆14May 16, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
microsoft / MM-WebAgent
View on GitHub
Build coherent and visually polished multimodal webpages with hierarchical planning, AIGC tools, and iterative reflection.
☆15May 17, 2026Updated 2 months ago
umair1221 / AgriCLIP
View on GitHub
A code
☆29Jan 23, 2025Updated last year
brendanhogan / completion_tree_view
View on GitHub
☆15Apr 26, 2025Updated last year
GAIR-NLP / AgencyBench
View on GitHub
[ACL2026 Main] AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts
☆90Jan 23, 2026Updated 6 months ago
uiuc-kang-lab / agentic-benchmarks
View on GitHub
☆60Jul 31, 2025Updated 11 months ago
MedHallu / MedHallu
View on GitHub
☆24Apr 25, 2025Updated last year
AlignGPT-VL / AlignGPT
View on GitHub
Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"
☆34Jul 12, 2024Updated 2 years ago
Amshaker / Mobile-O
View on GitHub
[CVPR'26 Demo] Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
☆154Apr 13, 2026Updated 3 months ago
akhtarvision / cal-detr
View on GitHub
☆42Nov 9, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
FudanCVL / AVI-Bench
View on GitHub
[ICML'26] Toward Human-like Audio-Visual Intelligence of Omni-MLLMs
☆16Jun 20, 2026Updated last month
mbzuai-oryx / LongShOT
View on GitHub
A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos
☆21Jun 20, 2026Updated last month
mbzuai-oryx / Video-R2
View on GitHub
Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models
☆19Jan 21, 2026Updated 6 months ago
fahadshamshad / deep-facial-privacy-prior
View on GitHub
[ECCVW 2024 -- ORAL] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".
☆12Oct 11, 2024Updated last year
mbzuai-oryx / AIN
View on GitHub
AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding…
☆55Mar 13, 2025Updated last year
Muzammal-Naseer / SAT
View on GitHub
Official repository for "Stylized Adversarial Training" (TPAMI 2022)
☆11Dec 30, 2022Updated 3 years ago
techmn / cosnet
View on GitHub
A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes (WACV 2025)
☆12Aug 11, 2025Updated 11 months ago