boyugou / GUI-Agents-Paper-List
Building a comprehensive and handy list of papers for GUI agents
☆34Updated this week
Related projects ⓘ
Alternatives and complementary repositories for GUI-Agents-Paper-List
- ☆16Updated last year
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆61Updated 7 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆36Updated last month
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆49Updated 9 months ago
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆102Updated 2 months ago
- Evaluating Mathematical Reasoning Beyond Accuracy☆37Updated 7 months ago
- Code and data for "ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM" (NeurIPS 2024 Track Datasets and…☆29Updated last month
- ☆54Updated 2 months ago
- ☆66Updated 6 months ago
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.☆115Updated 2 months ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆104Updated 5 months ago
- Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planning☆35Updated last year
- [EMNLP 2022] TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data☆17Updated last year
- trending projects & awesome papers about data-centric llm studies.☆31Updated 2 weeks ago
- [ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks☆22Updated 2 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆99Updated 3 weeks ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆105Updated 6 months ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆167Updated last month
- This the implementation of LeCo☆27Updated 4 months ago
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆28Updated 4 months ago
- ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆62Updated 7 months ago
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆123Updated last year
- [ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents☆13Updated last month
- ☆28Updated 9 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆111Updated last month
- official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"☆54Updated 11 months ago
- The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".☆62Updated last year
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆68Updated 5 months ago
- ☆28Updated this week
- Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist☆30Updated last month