mbzuai-oryx / Agent-XView external linksLinks
Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
☆36Nov 27, 2025Updated 2 months ago
Alternatives and similar repositories for Agent-X
Users that are interested in Agent-X are comparing it to the libraries listed below
Sorting:
- a Video Quality Analysis Toolkit☆13May 16, 2025Updated 8 months ago
- A new multi-task learning framework using Vision Transformers☆11Jun 19, 2024Updated last year
- ☆11Oct 29, 2024Updated last year
- Neural ODE Transformers (ICLR 2025)☆17Sep 6, 2025Updated 5 months ago
- ☆23Oct 30, 2025Updated 3 months ago
- [NAACL'25] Contains code and documentation for our VANE-Bench paper.☆17Aug 19, 2025Updated 5 months ago
- VideoMathQA is a benchmark designed to evaluate mathematical reasoning in real-world educational videos☆22Jan 26, 2026Updated 2 weeks ago
- ☆14Nov 26, 2021Updated 4 years ago
- [CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…☆50Aug 23, 2024Updated last year
- A code☆29Jan 23, 2025Updated last year
- The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]☆34Sep 1, 2025Updated 5 months ago
- [CVPRW 2025] Official repository of paper titled "Towards Evaluating the Robustness of Visual State Space Models"☆25Jun 8, 2025Updated 8 months ago
- MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots☆70Dec 5, 2025Updated 2 months ago
- Benchmark and model for step-by-step reasoning in autonomous driving.☆69Mar 15, 2025Updated 10 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆109May 27, 2025Updated 8 months ago
- Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"☆34Jul 12, 2024Updated last year
- Evaluate the Quality of Critique☆36Jun 1, 2024Updated last year
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆182Jun 5, 2025Updated 8 months ago
- image retrieval using metric learning☆10Nov 22, 2022Updated 3 years ago
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆104Sep 18, 2025Updated 4 months ago
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆42Jul 19, 2024Updated last year
- TOD-Flow: Modeling the Structure of Task-Oriented Dialogues☆13Feb 7, 2024Updated 2 years ago
- ☆10Oct 5, 2022Updated 3 years ago
- [ICCV2025] Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation☆14Aug 20, 2025Updated 5 months ago
- I saw this [Blog Post](https://www.morling.dev/blog/one-billion-row-challenge/) on a Billion Row challenge for Java so naturally I tried …☆14Jan 10, 2024Updated 2 years ago
- grpo to train long form QA and instructions with long-form reward model☆16Jul 17, 2025Updated 6 months ago
- ☆13Sep 23, 2022Updated 3 years ago
- [CVPR 2021] FMO Deblurring Benchmark☆13Jan 12, 2022Updated 4 years ago
- Python资源大全中文版,内容包括:Web框架、网络爬虫、网络内容提取、模板引擎、数据库、数据可视化、图片处理、文本处理、自然语言处理、机器学习、日志、代码分析等☆11May 24, 2016Updated 9 years ago
- ☆15Nov 27, 2025Updated 2 months ago
- Hierarchical Vision Transformers for Disease Progression Detection in Chest X-Ray Images☆11Jan 11, 2024Updated 2 years ago
- Predicting emotions on Android☆11Nov 26, 2020Updated 5 years ago
- Official implementation of the paper "LTrack: Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Rep…☆12Jul 26, 2023Updated 2 years ago
- Official implementation of "Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models"☆14Mar 19, 2025Updated 10 months ago
- Code for "RADSeg Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models"☆28Jan 27, 2026Updated 2 weeks ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆96Jul 4, 2024Updated last year
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆107Mar 6, 2025Updated 11 months ago
- ☆107Jun 10, 2025Updated 8 months ago
- [IPCAI'24 Best Paper] Advancing Surgical VQA with Scene Graph Knowledge☆47May 23, 2025Updated 8 months ago