invariantlabs-ai / playwright-computer-use
Let Claude control a web browser on your machine.
☆17Updated last month
Alternatives and similar repositories for playwright-computer-use:
Users that are interested in playwright-computer-use are comparing it to the libraries listed below
- A better way of testing, inspecting, and analyzing AI Agent traces.☆30Updated this week
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆50Updated 2 weeks ago
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated 7 months ago
- Agent computer interface for AI software engineer.☆51Updated this week
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆31Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆52Updated 3 months ago
- ☆20Updated 4 months ago
- Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claud…☆24Updated last week
- Sphynx Hallucination Induction☆53Updated last month
- An open-source AI podcast creator☆14Updated 4 months ago
- A framework-less approach to robust agent development.☆156Updated last week
- ☆50Updated 4 months ago
- A text-to-SQL prototype on the northwind sqlite dataset☆12Updated 6 months ago
- Pin files for contextual, codebase-level AI assistance.☆15Updated 8 months ago
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆32Updated last week
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆22Updated 2 months ago
- Scaling inference-time compute for LLM-as-a-judge, automated evaluations, guardrails, and reinforcement learning.☆189Updated last week
- Clue inspired puzzles for testing LLM deduction abilities☆31Updated this week
- The Granite Guardian models are designed to detect risks in prompts and responses.☆72Updated last week
- A Python library to orchestrate LLMs in a neural network-inspired structure☆46Updated 5 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 4 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated this week
- Turing machines, Rule 110, and A::B reversal using Claude 3 Opus.☆59Updated 10 months ago
- One Line To Build Zero-Data Classifiers in Minutes☆36Updated 6 months ago
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆25Updated 9 months ago
- Automated Capability Discovery via Foundation Model Self-Exploration☆42Updated last month
- never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…☆37Updated 10 months ago
- Lego for GRPO☆25Updated last week
- ☆36Updated last month
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆129Updated last week