JHU-CLSP / turking-benchView external linksLinks
Web-grounded natural language instructions
☆18Nov 25, 2024Updated last year
Alternatives and similar repositories for turking-bench
Users that are interested in turking-bench are comparing it to the libraries listed below
Sorting:
- Natural Perturbation for Robust Question Answering☆12Apr 7, 2020Updated 5 years ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Dec 29, 2024Updated last year
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆136Jul 17, 2024Updated last year
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆48Feb 27, 2025Updated 11 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆26Jan 14, 2025Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Feb 23, 2024Updated last year
- ☆30Jun 25, 2024Updated last year
- ☆34Mar 6, 2025Updated 11 months ago
- ☆17Sep 1, 2024Updated last year
- ☆123Jun 6, 2024Updated last year
- Self-hosted GPT-4V api☆27Nov 6, 2023Updated 2 years ago
- ☆37Oct 2, 2024Updated last year
- Code for "Interactive Task Planning with Language Models"☆33Jan 12, 2026Updated last month
- This repository contains the registries for components, agents and services, the second part of the autonolas-v1 protocol.☆15Updated this week
- An unofficial Python 3 version of jemdoc.☆10Feb 8, 2026Updated last week
- m&ms: A Benchmark to Evaluate Tool-Use for multi-step multi-modal tasks☆44Sep 26, 2024Updated last year
- ☆87Dec 15, 2023Updated 2 years ago
- For engineers seeking a fast, memory-efficient database, Rapto provides transposition-heuristic storage, low memory footprint and high-pe…☆22Oct 10, 2025Updated 4 months ago
- The repository provides code for the paper RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders, CIKM'24☆11Oct 21, 2024Updated last year
- Ask AI to test your website with a specific goal☆15Dec 22, 2023Updated 2 years ago
- An Awesome, Feature Rich Discord Bot for Hosting and Managing CTF Challenges on Discord Written in Python3☆11Jun 29, 2024Updated last year
- This repository contains numerous small utility packages. These packages serve various useful purposes and are written in nano ESModule w…☆10Updated this week
- A platform aimed at creating websites that perform self-optimization☆12May 4, 2024Updated last year
- Kait's Site☆14Sep 7, 2021Updated 4 years ago
- SmaliAnalyzer parses dissasembled bytecode of Android applications to gather as much information as possible about their component classe…☆13Apr 17, 2019Updated 6 years ago
- OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents☆21Jan 6, 2026Updated last month
- ☆19Updated this week
- Concurrent data extraction from unstructured text and images using AI models.☆18Aug 10, 2025Updated 6 months ago
- ☆12Feb 22, 2021Updated 4 years ago
- VisualWebArena is a benchmark for multimodal agents.☆431Nov 9, 2024Updated last year
- An implementation of an augmented red-black tree in Zig with 3 layers of abstraction☆12Aug 27, 2025Updated 5 months ago
- Corpus to accompany: "Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding"☆11Apr 11, 2025Updated 10 months ago
- A flexible data structure for multi-input multi-output models☆10Oct 12, 2021Updated 4 years ago
- Lucene open-domain QA retrieval in python☆11Feb 18, 2021Updated 4 years ago
- Ray and Anyscale for UC Berkeley AI Hackathon!☆11Jun 17, 2023Updated 2 years ago
- ☆12Jun 30, 2023Updated 2 years ago
- A distributed execution framework built upon lunatic.☆16Jan 19, 2024Updated 2 years ago
- ☆13Sep 2, 2021Updated 4 years ago
- Toy distributed PostgreSQL by implementing SQL over KV☆11Jan 14, 2026Updated last month