aiverify-foundation / aiverify
AI Verify
☆145Updated this week
Alternatives and similar repositories for aiverify:
Users that are interested in aiverify are comparing it to the libraries listed below
- Moonshot - A simple and modular tool to evaluate and red-team any LLM application.☆226Updated this week
- ☆42Updated 8 months ago
- ☆9Updated 2 months ago
- Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)☆32Updated this week
- Fiddler Auditor is a tool to evaluate language models.☆179Updated last year
- An open-source compliance-centered evaluation framework for Generative AI models☆147Updated 4 months ago
- Red-Teaming Language Models with DSPy☆183Updated 2 months ago
- This repository stems from our paper, “Cataloguing LLM Evaluations”, and serves as a living, collaborative catalogue of LLM evaluation fr…☆17Updated last year
- 📚 A curated list of papers & technical articles on AI Quality & Safety☆178Updated last week
- Stanford CRFM's initiative to assess potential compliance with the draft EU AI Act☆94Updated last year
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆88Updated this week
- Repository of tools, resources and guidance for real-world AI governance☆19Updated last week
- Collection of evals for Inspect AI☆115Updated this week
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆294Updated 3 months ago
- This is an open-source tool to assess and improve the trustworthiness of AI systems.☆90Updated 2 weeks ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆130Updated 3 weeks ago
- Inspect: A framework for large language model evaluations☆903Updated this week
- A toolkit for tools and techniques related to the privacy and compliance of AI models.☆100Updated 9 months ago
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆109Updated last year
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆102Updated last year
- A curated list of awesome synthetic data tools (open source and commercial).☆174Updated last year
- A curated list of awesome academic research, books, code of ethics, data sets, institutes, maturity models, newsletters, principles, podc…☆71Updated this week
- LLM Self Defense: By Self Examination, LLMs know they are being tricked☆32Updated 11 months ago
- A Comprehensive Assessment of Trustworthiness in GPT Models☆284Updated 7 months ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆78Updated last month
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆114Updated last week
- A small library of LLM judges☆182Updated last week
- Find the samples, in the test data, on which your (generative) model makes mistakes.☆26Updated 6 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆111Updated 11 months ago
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs☆240Updated 10 months ago