Giskard-AI / awesome-ai-safety
π A curated list of papers & technical articles on AI Quality & Safety
β172Updated last year
Alternatives and similar repositories for awesome-ai-safety:
Users that are interested in awesome-ai-safety are comparing it to the libraries listed below
- Fiddler Auditor is a tool to evaluate language models.β178Updated last year
- Creating the tools and data sets necessary to evaluate vulnerabilities in LLMs.β23Updated 2 weeks ago
- β263Updated 2 months ago
- π Datasets and models for instruction-tuningβ238Updated last year
- Open Implementations of LLM Analysesβ103Updated 5 months ago
- β42Updated 7 months ago
- A tool for evaluating LLMsβ410Updated 10 months ago
- Stanford CRFM's initiative to assess potential compliance with the draft EU AI Actβ93Updated last year
- Red-Teaming Language Models with DSPyβ175Updated last month
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".β98Updated last year
- Keeping language models honest by directly eliciting knowledge encoded in their activations.β197Updated last week
- The Foundation Model Transparency Indexβ77Updated 10 months ago
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.β83Updated this week
- data cleaning and curation for unstructured textβ329Updated 7 months ago
- The Granite Guardian models are designed to detect risks in prompts and responses.β72Updated last week
- β221Updated this week
- Erasing concepts from neural representations with provable guaranteesβ226Updated 2 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various useβ¦β102Updated this week
- A joint community effort to create one central leaderboard for LLMs.β294Updated 7 months ago
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMsβ230Updated 9 months ago
- A framework-less approach to robust agent development.β156Updated this week
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Modelsβ504Updated 9 months ago
- A framework to empower forecasting using Large Language Models (LLMs)β105Updated 8 months ago
- Collection of evals for Inspect AIβ101Updated this week
- A curated list of awesome resources for Artificial Intelligence Alignment researchβ69Updated last year
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to aβ¦β349Updated last year
- Mixing Language Models with Self-Verification and Meta-Verificationβ102Updated 3 months ago
- This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.β307Updated 9 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vectoβ¦β230Updated last month
- Fast & more realistic evaluation of chat language models. Includes leaderboard.β185Updated last year