Web-grounded natural language instructions
☆18Nov 25, 2024Updated last year
Alternatives and similar repositories for turking-bench
Users that are interested in turking-bench are comparing it to the libraries listed below
Sorting:
- Natural Perturbation for Robust Question Answering☆12Apr 7, 2020Updated 5 years ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆65Oct 19, 2024Updated last year
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Dec 29, 2024Updated last year
- MCPL: MULTI-CONCEPT PROMPT LEARNING☆20May 27, 2024Updated last year
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆48Feb 27, 2025Updated last year
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆26Jan 14, 2025Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Feb 23, 2024Updated 2 years ago
- ☆34Mar 6, 2025Updated last year
- ☆30Jun 25, 2024Updated last year
- WONDERBREAD benchmark + dataset for BPM tasks☆34Jul 30, 2025Updated 7 months ago
- ☆17Sep 1, 2024Updated last year
- ☆123Jun 6, 2024Updated last year
- Self-hosted GPT-4V api☆27Nov 6, 2023Updated 2 years ago
- ☆37Oct 2, 2024Updated last year
- This repository contains the registries for components, agents and services, the second part of the autonolas-v1 protocol.☆15Updated this week
- An unofficial Python 3 version of jemdoc.☆11Feb 8, 2026Updated last month
- ☆13Apr 27, 2021Updated 4 years ago
- m&ms: A Benchmark to Evaluate Tool-Use for multi-step multi-modal tasks☆45Sep 26, 2024Updated last year
- ☆87Dec 15, 2023Updated 2 years ago
- Kait's Site☆14Sep 7, 2021Updated 4 years ago
- SmaliAnalyzer parses dissasembled bytecode of Android applications to gather as much information as possible about their component classe…☆13Apr 17, 2019Updated 6 years ago
- For engineers seeking a fast, memory-efficient database, Rapto provides transposition-heuristic storage, low memory footprint and high-pe…☆22Oct 10, 2025Updated 4 months ago
- ☆19Mar 2, 2026Updated last week
- An Awesome, Feature Rich Discord Bot for Hosting and Managing CTF Challenges on Discord Written in Python3☆11Jun 29, 2024Updated last year
- A platform aimed at creating websites that perform self-optimization☆12May 4, 2024Updated last year
- Designed to help lawyers and legal professionals find precedent fast and prepare for case negotiations by simulating trajectories☆10Oct 16, 2024Updated last year
- The repository provides code for the paper RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders, CIKM'24☆11Oct 21, 2024Updated last year
- ☆12Feb 22, 2021Updated 5 years ago
- VisualWebArena is a benchmark for multimodal agents.☆440Nov 9, 2024Updated last year
- Lucene open-domain QA retrieval in python☆11Feb 18, 2021Updated 5 years ago
- ☆10May 20, 2019Updated 6 years ago
- The Wasm Course☆11Jan 16, 2024Updated 2 years ago
- Open-Retrieval Conversational Machine Reading: A new setting & OR-ShARC dataset☆13Nov 19, 2022Updated 3 years ago
- BLEU Score in Rust☆12Mar 1, 2026Updated last week
- Ray and Anyscale for UC Berkeley AI Hackathon!☆11Jun 17, 2023Updated 2 years ago
- ☆13Sep 2, 2021Updated 4 years ago
- Toy distributed PostgreSQL by implementing SQL over KV☆11Jan 14, 2026Updated last month
- MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…☆13Jan 16, 2024Updated 2 years ago
- Code for running forward and backward versions of GPT2☆10Nov 20, 2021Updated 4 years ago