Reallm-Labs / InfiGUI-R1
Repository for the paper "InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners"
☆16Updated last week
Alternatives and similar repositories for InfiGUI-R1:
Users that are interested in InfiGUI-R1 are comparing it to the libraries listed below
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"☆26Updated 8 months ago
- Multimodal RewardBench☆38Updated 2 months ago
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆26Updated last month
- ☆29Updated 7 months ago
- Source code of paper: A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models.☆18Updated 3 weeks ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆54Updated this week
- An Easy-to-use Hallucination Detection Framework for LLMs.☆58Updated last year
- ☆73Updated 3 months ago
- ☆18Updated 5 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆35Updated 3 weeks ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆46Updated this week
- A Self-Training Framework for Vision-Language Reasoning☆76Updated 3 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆82Updated last year
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆46Updated last year
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆102Updated last year
- [ACL 2024] The project of Symbol-LLM☆54Updated 9 months ago
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆41Updated 4 months ago
- ☆40Updated 3 weeks ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆43Updated 5 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆87Updated last year
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆55Updated 6 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆24Updated last month
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆18Updated 6 months ago
- ☆40Updated 3 months ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆46Updated 6 months ago
- [EMNLP 2024] A Video Chat Agent with Temporal Prior☆29Updated last month
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆74Updated this week
- ☆63Updated last year
- An Illusion of Progress? Assessing the Current State of Web Agents☆38Updated last week
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆42Updated 2 months ago