OpenDFM / MobA
๐ฎManipulates mobile phones just like how you would. Official code for "MobA: A Two-Level Agent System for Efficient Mobile Task Automation".
โ20Updated 5 months ago
Alternatives and similar repositories for MobA:
Users that are interested in MobA are comparing it to the libraries listed below
- Simple Implementation of TinyGPTV in super simple Zeta lego blocksโ16Updated 4 months ago
- An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generationโ10Updated 5 months ago
- โ35Updated last month
- A list of language models with permissive licenses such as MIT or Apache 2.0โ24Updated last month
- โ32Updated 2 months ago
- Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"โ15Updated 3 weeks ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understandingโ50Updated 3 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.โ33Updated last year
- The open source implementation of "AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model"โ22Updated 2 months ago
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"โ13Updated last week
- โ27Updated last month
- Control LLMโ14Updated this week
- โ57Updated 6 months ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant โฆโ16Updated last year
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't relโฆโ13Updated last year
- โ13Updated 3 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"โ42Updated last month
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Modelsโ28Updated last year
- Lottery Ticket Adaptationโ39Updated 4 months ago
- โ16Updated 5 months ago
- โ36Updated 2 years ago
- โ24Updated 6 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024โ58Updated last month
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.โ36Updated 6 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvementโ64Updated last week
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"โ25Updated 8 months ago
- Repo for "Z1: Efficient Test-time Scaling with Code"โ24Updated last week
- โ16Updated 8 months ago
- โ14Updated last month
- Official Pytorch Implementation of Self-emerging Token Labelingโ32Updated last year