OpenDFM / MobA
🎮Manipulates mobile phones just like how you would. Official code for "MobA: A Two-Level Agent System for Efficient Mobile Task Automation".
☆12Updated last week
Related projects ⓘ
Alternatives and complementary repositories for MobA
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆37Updated 6 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated this week
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆16Updated this week
- Official implementation of ECCV24 paper: POA☆24Updated 3 months ago
- ☆16Updated 2 months ago
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆11Updated 9 months ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆16Updated 3 weeks ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 7 months ago
- ☆22Updated 4 months ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆15Updated this week
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆17Updated 3 weeks ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- Lottery Ticket Adaptation☆36Updated last month
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆34Updated 3 weeks ago
- The Official Code Repository for GUI-World.☆37Updated 3 months ago
- ☆15Updated 3 months ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆27Updated last week
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆27Updated 4 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 10 months ago
- ☆12Updated last week
- DPO, but faster 🚀☆21Updated 2 weeks ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆25Updated 4 months ago
- A Data Source for Reasoning Embodied Agents☆19Updated last year
- MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆17Updated last week
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆14Updated 8 months ago
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆22Updated last week
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated 7 months ago
- A vast array of Multi-Modal Embodied Robotic Foundation Models!☆24Updated 7 months ago
- Official repository for paper "GTA: A Benchmark for General Tool Agents" (NeurIPS 2024 D&B Track)☆43Updated last week
- An simple pytorch implementation of Flash MultiHead Attention☆14Updated 9 months ago