boyugou / GUI-Agents-Paper-List

Building a comprehensive and handy list of papers for GUI agents

☆34

Related projects ⓘ

Alternatives and complementary repositories for GUI-Agents-Paper-List

wjn1996 / Chain-of-Knowledge
☆16Updated last year
OSU-NLP-Group / LLM-Knowledge-Conflict
[ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"
☆61Updated 7 months ago
WeiminXiong / IPR
Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)
☆36Updated last month
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆49Updated 9 months ago
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆102Updated 2 months ago
GAIR-NLP / ReasonEval
Evaluating Mathematical Reasoning Beyond Accuracy
☆37Updated 7 months ago
zhaochen0110 / conflictbank
Code and data for "ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM" (NeurIPS 2024 Track Datasets and…
☆29Updated last month
GAIR-NLP / weak-to-strong-reasoning
☆54Updated 2 months ago
GAIR-NLP / alignment-for-honesty
☆66Updated 6 months ago
THUNLP-MT / StableToolBench
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
☆115Updated 2 months ago
xingyaoww / mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆104Updated 5 months ago
siyuyuan / coscript
Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planning
☆35Updated last year
koalazf99 / tacube
[EMNLP 2022] TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data
☆17Updated last year
koalazf99 / Awesome-DataCentric-LLM
trending projects & awesome papers about data-centric llm studies.
☆31Updated 2 weeks ago
zorazrw / trove
[ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks
☆22Updated 2 months ago
Yifan-Song793 / ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆99Updated 3 weeks ago
ZubinGou / math-evaluation-harness
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆105Updated 6 months ago
bigai-nlco / LooGLE
ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models
☆167Updated last month
starrYYxuan / LeCo
This the implementation of LeCo
☆27Updated 4 months ago
hanxuhu / SeqIns
The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…
☆28Updated 4 months ago
Junjie-Ye / ToolEyes
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
☆62Updated 7 months ago
Timothyxxx / EnvInteractiveLMPapers
Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…
☆123Updated last year
magicgh / Self-MAP
[ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents
☆13Updated last month
liujch1998 / rainier
☆28Updated 9 months ago
StonyBrookNLP / appworld
🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…
☆111Updated last month
microsoft / LEMA
official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"
☆54Updated 11 months ago
sail-sg / symbolic-instruction-tuning
The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".
☆62Updated last year
chujiezheng / LLM-Extrapolation
Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"
☆68Updated 5 months ago
xhan77 / context-aware-decoding
☆28Updated this week
PremiLab-Math / MathCheck
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
☆30Updated last month