mlcommons / ailuminateLinks
The AILuminate v1.1 benchmark suite is an AI risk assessment benchmark developed with broad involvement from leading AI companies, academia, and civil society.
☆65Updated 7 months ago
Alternatives and similar repositories for ailuminate
Users that are interested in ailuminate are comparing it to the libraries listed below
Sorting:
- Public repository containing METR's DVC pipeline for eval data analysis☆183Updated this week
- We track and analyze the activity and performance of autonomous code agents in the wild☆48Updated last month
- Let Claude control a web browser on your machine.☆39Updated 7 months ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆127Updated 3 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆100Updated 9 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆61Updated 8 months ago
- An open-source compliance-centered evaluation framework for Generative AI models☆178Updated 3 weeks ago
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated last week
- ☆89Updated this week
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆86Updated 10 months ago
- A Text-Based Environment for Interactive Debugging☆289Updated this week
- Work in progress! I don't recommend looking at the code right now.☆24Updated last month
- The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.☆48Updated last month
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆130Updated last week
- Transformer GPU VRAM estimator☆67Updated last year
- A catalogue of existing Nanda servers☆190Updated 8 months ago
- OpenAI Guardrails - Python☆159Updated 2 weeks ago
- explore token trajectory trees on instruct and base models☆150Updated 7 months ago
- Red-Teaming Language Models with DSPy☆250Updated 11 months ago
- Prompts used in the Automated Auditing Blog Post☆134Updated 5 months ago
- Test Generation for Prompts☆148Updated this week
- Your buddy in the (L)LM space.☆64Updated last year
- Code for the paper "Defeating Prompt Injections by Design"☆212Updated 7 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆256Updated last week
- A library for building software agents using behavior trees and language models.☆90Updated 11 months ago
- lossily compress representation vectors using product quantization☆59Updated 2 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆173Updated 2 weeks ago
- Controlled text generation with programmable constraints☆170Updated last week
- A suite of open-ended, non-imitative tasks involving generalizable skills for large language model chatbots and agents to enable bootstra…☆43Updated 11 months ago
- Track the progress of LLM context utilisation☆55Updated 9 months ago