OpenDFM / MobALinks
๐ฎManipulates mobile phones just like how you would. Official code for "MobA: A Two-Level Agent System for Efficient Mobile Task Automation".
โ23Updated last month
Alternatives and similar repositories for MobA
Users that are interested in MobA are comparing it to the libraries listed below
Sorting:
- Official repository for "Web-Shepherd: Advancing PRMs for Reinforcing Web Agents"โ27Updated 2 weeks ago
- โ55Updated 6 months ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocksโ15Updated 6 months ago
- โ32Updated 4 months ago
- โ40Updated 3 weeks ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understandingโ51Updated 5 months ago
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"โ28Updated 10 months ago
- An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generationโ15Updated 7 months ago
- โ65Updated 2 months ago
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't relโฆโ12Updated last year
- a tool for gerenate dataset from docโ12Updated 2 months ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.โ38Updated 8 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024โ59Updated 3 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"โ43Updated 3 months ago
- (ICLR 2025) The Official Code Repository for GUI-World.โ57Updated 5 months ago
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)โ28Updated 5 months ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Modelsโ28Updated last year
- Official Implementation of APB (ACL 2025 main)โ28Updated 3 months ago
- โ13Updated 9 months ago
- โ24Updated 8 months ago
- An open-source toolkit helping developers build natural language database query solutionsโ14Updated 3 weeks ago
- โ24Updated last week
- โ35Updated 8 months ago
- โ36Updated 2 years ago
- โ18Updated last year
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"โ14Updated 2 months ago
- A Data Source for Reasoning Embodied Agentsโ19Updated last year
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"โ20Updated 2 months ago
- Multi-Layer Key-Value sharing experiments on Pythia modelsโ33Updated 11 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"โ25Updated 3 weeks ago