OSU-NLP-Group / Online-Mind2Web
An Illusion of Progress? Assessing the Current State of Web Agents
☆38Updated this week
Alternatives and similar repositories for Online-Mind2Web:
Users that are interested in Online-Mind2Web are comparing it to the libraries listed below
- [ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorials☆28Updated 2 months ago
- "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆68Updated 2 weeks ago
- ☆81Updated this week
- ☆14Updated last week
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆47Updated 5 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆60Updated 4 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆57Updated 6 months ago
- ☆59Updated 7 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆134Updated 5 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆76Updated 3 months ago
- ☆51Updated last week
- M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆56Updated 3 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆134Updated 4 months ago
- A Survey on the Honesty of Large Language Models☆57Updated 4 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆75Updated last week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆115Updated last month
- ☆154Updated 3 weeks ago
- ☆125Updated 3 weeks ago
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios☆54Updated last year
- ☆45Updated last week
- ☆44Updated 5 months ago
- ☆55Updated 6 months ago
- ☆26Updated 2 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 4 months ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆67Updated last year
- A Comprehensive Survey on Long Context Language Modeling☆131Updated 3 weeks ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆55Updated 6 months ago
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆127Updated last year
- [EMNLP 2024] Multi-modal reasoning problems via code generation.☆22Updated 2 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆92Updated last week