gimme1dollar / b-moca

Benchmarking Mobile Device Control Agents across Diverse Configurations (ICLR 2024 workshop GenAI4DM spotlight presentation)

☆23

Related projects: ⓘ

microsoft / SmartPlay
SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …
☆115Updated 5 months ago
CraftJarvis / GROOT
GROOT: Learning to Follow Instructions by Watching Gameplay Videos
☆54Updated 9 months ago
X-LANCE / Mobile-Env
A Universal Platform for Training and Evaluation of Mobile Interaction
☆31Updated last month
DeckardAgent / deckard
Official implementation of the DECKARD Agent from the paper "Do Embodied Agents Dream of Pixelated Sheep?"
☆84Updated last year
ygjin11 / r2-play
The official implementation of the paper "Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction".
☆32Updated 7 months ago
google-research / android_world
AndroidWorld is an environment and benchmark for autonomous agents
☆86Updated this week
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents
☆81Updated last week
CraftJarvis / MC-Controller
Implementation of "Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction"
☆42Updated last year
abdulhaim / LMRL-Gym
☆65Updated 2 months ago
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆84Updated 5 months ago
agentification / RAFA_code
☆131Updated 4 months ago
ltzheng / Synapse
[ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control
☆48Updated 3 weeks ago
conglu1997 / intelligent-go-explore
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
☆41Updated 3 months ago
abaheti95 / LoL-RL
Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients
☆24Updated last week
vmicheli / delta-iris
Efficient World Models with Context-Aware Tokenization. ICML 2024
☆73Updated 2 months ago
GuanSuns / LLMs-World-Models-for-Planning
The source code of the paper "Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Pla…
☆69Updated last month
szxiangjn / world-model-for-language-model
☆102Updated 2 months ago
PKU-RL / Creative-Agents
☆37Updated 9 months ago
jhejna / cpl
Code for Contrastive Preference Learning (CPL)
☆147Updated 6 months ago
xlang-ai / text2reward
[ICLR 2024] Code for the paper "Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning"
☆113Updated 8 months ago
facebookresearch / rlfh-gen-div
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆35Updated 8 months ago
bigai-nlco / langsuite
Official Repo of LangSuitE
☆74Updated last month
BladeTransformerLLC / OvercookedGPT
An OpenAI gym environment to evaluate the ability of LLMs (eg. GPT-4, Claude) in long-horizon reasoning and task planning in dynamic mult…
☆61Updated last year
minerllabs / basalt-benchmark
BASALT Benchmark datasets, evaluation code and agent training example.
☆19Updated 9 months ago
csmile-1006 / ARP
Guide Your Agent with Adaptive Multimodal Rewards (NeurIPS 2023 Accepted)
☆32Updated 11 months ago
Asap7772 / understanding-rlhf
☆23Updated 4 months ago
ZJLAB-AMMI / LLM4RL
A RL approach to enable cost-effective, intelligent interactions between a local agent and a remote LLM
☆60Updated 3 weeks ago
DigiRL-agent / digirl
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
☆200Updated last month
WeihaoTan / TWOSOME
Implementation of TWOSOME
☆42Updated 4 months ago
CraftJarvis / OmniJarvis
☆21Updated 2 months ago