mlcommons / ailuminateLinks
The AILuminate v1.1 benchmark suite is an AI risk assessment benchmark developed with broad involvement from leading AI companies, academia, and civil society.
☆70Updated 7 months ago
Alternatives and similar repositories for ailuminate
Users that are interested in ailuminate are comparing it to the libraries listed below
Sorting:
- Model Context Protocol (MCP) server for constraint optimization and solving"☆150Updated 4 months ago
- ☆93Updated last week
- LLM plugin for clustering embeddings☆82Updated last year
- Prompts used in the Automated Auditing Blog Post☆137Updated 6 months ago
- Using Large Language Models for Repo-wide Type Prediction☆114Updated 2 years ago
- Public repository containing METR's DVC pipeline for eval data analysis☆199Updated last week
- Your buddy in the (L)LM space.☆64Updated last year
- The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.☆49Updated 2 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆58Updated 10 months ago
- Pivotal Token Search☆144Updated last month
- The Granite Guardian models are designed to detect risks in prompts and responses.☆130Updated 4 months ago
- Let Claude control a web browser on your machine.☆40Updated 8 months ago
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆259Updated last month
- ☆34Updated 9 months ago
- A library for building software agents using behavior trees and language models.☆90Updated last year
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆100Updated 9 months ago
- A cookiecutter template for creating a new LLM plugin that adds tools to LLM☆28Updated 8 months ago
- Run models distributed as GGUF files using LLM☆84Updated last year
- Access the Cohere Command R family of models☆38Updated 10 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆61Updated 9 months ago
- Prototype advanced LLM algorithms for reasoning and planning.☆99Updated last year
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆134Updated this week
- A benchmarking tool for evaluating AI coding assistants on real-world software engineering tasks from the SWE-Bench dataset.☆63Updated 2 weeks ago
- A better way of testing, inspecting, and analyzing AI Agent traces.☆47Updated 3 weeks ago
- A Text-Based Environment for Interactive Debugging☆294Updated this week
- Hierarchical topic segmentation of meeting transcripts using embeddings and divisive clustering.☆54Updated last year
- CodeNav is an LLM agent that navigates and leverages previously unseen code repositories to solve user queries.☆65Updated last year
- We track and analyze the activity and performance of autonomous code agents in the wild☆48Updated 2 months ago
- ☆86Updated this week
- Alice in Wonderland code base for experiments and raw experiments data☆131Updated this week