mlcommons / ailuminateLinks
The AILuminate v1.1 benchmark suite is an AI risk assessment benchmark developed with broad involvement from leading AI companies, academia, and civil society.
☆65Updated 6 months ago
Alternatives and similar repositories for ailuminate
Users that are interested in ailuminate are comparing it to the libraries listed below
Sorting:
- Public repository containing METR's DVC pipeline for eval data analysis☆164Updated 8 months ago
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆233Updated last month
- A benchmarking tool for evaluating AI coding assistants on real-world software engineering tasks from the SWE-Bench dataset.☆62Updated 6 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆100Updated 8 months ago
- ☆33Updated 8 months ago
- Your buddy in the (L)LM space.☆64Updated last year
- Prompts used in the Automated Auditing Blog Post☆127Updated 5 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆61Updated 7 months ago
- Framework for specifying and proving properties—such as robustness, fairness, and interpretability—of machine learning models using Lean …☆73Updated 4 months ago
- Model Context Protocol (MCP) server for constraint optimization and solving"☆145Updated 3 months ago
- BlindBox is a tool to isolate and deploy applications inside Trusted Execution Environments for privacy-by-design apps☆63Updated 2 years ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆123Updated 2 months ago
- explore token trajectory trees on instruct and base models☆149Updated 6 months ago
- The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.☆47Updated last month
- Multi-language code navigation API in a container☆95Updated 4 months ago
- An open-source compliance-centered evaluation framework for Generative AI models☆177Updated this week
- LLM plugin for clustering embeddings☆82Updated last year
- Flask app for article abstract and listing pages☆175Updated last week
- Pivotal Token Search☆141Updated last week
- A preprint version of our recent research on the capability of frontier AI systems to do self-replication☆60Updated last year
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆125Updated last month
- Controlled text generation with programmable constraints☆168Updated this week
- lossily compress representation vectors using product quantization☆59Updated last month
- ☆81Updated last week
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆82Updated last year
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated 2 months ago
- A Text-Based Environment for Interactive Debugging☆286Updated last week
- Alice in Wonderland code base for experiments and raw experiments data☆131Updated 3 months ago
- Using Large Language Models for Repo-wide Type Prediction☆112Updated 2 years ago
- Minimal open-source implementation of AlphaProof [WIP]☆54Updated last week