OpenDFM / MobA
๐ฎManipulates mobile phones just like how you would. Official code for "MobA: A Two-Level Agent System for Efficient Mobile Task Automation".
โ17Updated 4 months ago
Alternatives and similar repositories for MobA:
Users that are interested in MobA are comparing it to the libraries listed below
- An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generationโ10Updated 4 months ago
- โ30Updated last month
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"โ41Updated last week
- Simple Implementation of TinyGPTV in super simple Zeta lego blocksโ15Updated 3 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024โ56Updated last week
- Official implementation of ECCV24 paper: POAโ24Updated 7 months ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.โ17Updated 2 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understandingโ48Updated 2 months ago
- The open source implementation of "AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model"โ22Updated last month
- โ16Updated 10 months ago
- โ35Updated last week
- Official Pytorch Implementation of Self-emerging Token Labelingโ32Updated 11 months ago
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"โ24Updated 7 months ago
- โ18Updated 6 months ago
- โ13Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMsโ40Updated 8 months ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Modelsโ28Updated 11 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Explorationโ24Updated 2 months ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.โ37Updated 5 months ago
- EfficientSAM + YOLO World base model for use with Autodistill.โ9Updated last year
- โ28Updated 5 months ago
- (ICLR 2025) The Official Code Repository for GUI-World.โ53Updated 2 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"โ20Updated last month
- The official repo of continuous speculative decodingโ24Updated 3 months ago
- โ28Updated 6 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"โ49Updated 4 months ago