Samsung / ClickAgent
ClickAgent: Enhancing UI Location Capabilities of Autonomous Agents
☆12Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for ClickAgent
- ☆21Updated last month
- Exploration of the multi modal fuyu-8b model of Adept. 🤓 🔍☆28Updated last year
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated 3 months ago
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆17Updated 3 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆29Updated 5 months ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆24Updated last week
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆16Updated 2 weeks ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆62Updated 3 weeks ago
- ☆35Updated 5 months ago
- Representing Rule-based Chatbots with Transformers☆18Updated 4 months ago
- Source code for EMNLP2022 long paper: Parameter-Efficient Tuning Makes a Good Classification Head☆13Updated 2 years ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated 8 months ago
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆33Updated last year
- 🎮Manipulates mobile phones just like how you would. Official code for "MobA: A Two-Level Agent System for Efficient Mobile Task Automati…☆13Updated 3 weeks ago
- 1.4B sLLM for Chinese and English - HammerLLM🔨☆43Updated 7 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆38Updated 7 months ago
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated 9 months ago
- 本项目是关于Yi的多模态系列模型,如Yi-VL-6B/34B等的实验与应用。☆12Updated 9 months ago
- ☆19Updated last month
- Implementation of the DocLLM paper for Llama models.☆12Updated 3 weeks ago
- Official code for infimm-hd☆15Updated 2 months ago
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year
- ☆17Updated last year
- Official implementation of Generative Colorization of Structured Mobile Web Pages, WACV 2023.☆21Updated 11 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆30Updated 4 months ago
- ☆21Updated this week
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆69Updated last week
- This repository will collect and share awesome ChatGPT related papers and useful tools☆17Updated last year
- A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools☆65Updated last year