ritzz-ai / GUI-R1
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
☆76Updated last week
Alternatives and similar repositories for GUI-R1
Users that are interested in GUI-R1 are comparing it to the libraries listed below
Sorting:
- A Self-Training Framework for Vision-Language Reasoning☆78Updated 3 months ago
- ☆117Updated 3 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆55Updated 8 months ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆107Updated 3 weeks ago
- OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.☆50Updated this week
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆54Updated last week
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆56Updated 10 months ago
- ☆24Updated 3 months ago
- ☆25Updated last year
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆73Updated 6 months ago
- VLM^2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues☆40Updated 2 months ago
- (ICLR2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆34Updated 2 months ago
- ☆99Updated last year
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆38Updated last month
- ☆73Updated 11 months ago
- ☆75Updated 4 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆87Updated last year
- A curated collection of resources, tools, and frameworks for developing GUI Agents.☆42Updated 3 weeks ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆90Updated last week
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆18Updated 7 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆54Updated last month
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆73Updated 11 months ago
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆34Updated last month
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆44Updated 6 months ago
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆67Updated 2 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆79Updated 3 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆97Updated 9 months ago
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…☆78Updated 2 months ago
- A RLHF Infrastructure for Vision-Language Models☆175Updated 6 months ago