OpenDFM / MobALinks
๐ฎManipulates mobile phones just like how you would. Official code for "MobA: A Two-Level Agent System for Efficient Mobile Task Automation".
โ25Updated 3 months ago
Alternatives and similar repositories for MobA
Users that are interested in MobA are comparing it to the libraries listed below
Sorting:
- โ56Updated 8 months ago
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't relโฆโ12Updated last year
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024โ60Updated 5 months ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Modelsโ28Updated last year
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."โ91Updated this week
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"โ46Updated 4 months ago
- Official repository for "Web-Shepherd: Advancing PRMs for Reinforcing Web Agents"โ37Updated 2 months ago
- The open source implementation of "AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model"โ21Updated 5 months ago
- โ35Updated 2 years ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understandingโ52Updated 7 months ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocksโ16Updated 8 months ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.โ39Updated 10 months ago
- Vision-oriented multimodal AIโ49Updated last year
- โ66Updated 3 months ago
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"โ28Updated 11 months ago
- ZeroGUI: Automating Online GUI Learning at Zero Human Costโ76Updated last week
- โ71Updated 2 weeks ago
- โ24Updated 10 months ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)โ34Updated last year
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvementโ94Updated this week
- โ46Updated 2 months ago
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"โ10Updated 7 months ago
- (ICLR 2025) The Official Code Repository for GUI-World.โ61Updated 7 months ago
- Code for paper "Table-R1: Inference-Time Scaling for Table Reasoning"โ19Updated last month
- โ33Updated 6 months ago
- Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.โ10Updated last year
- Multi-Layer Key-Value sharing experiments on Pythia modelsโ33Updated last year
- GPT-4V in Wonderland: LMMs as Smartphone Agentsโ133Updated last year
- EfficientSAM + YOLO World base model for use with Autodistill.โ10Updated last year
- A list of language models with permissive licenses such as MIT or Apache 2.0โ24Updated 4 months ago