Giskard-AI / awesome-ai-safety
π A curated list of papers & technical articles on AI Quality & Safety
β166Updated last year
Alternatives and similar repositories for awesome-ai-safety:
Users that are interested in awesome-ai-safety are comparing it to the libraries listed below
- β258Updated this week
- Fiddler Auditor is a tool to evaluate language models.β174Updated 10 months ago
- Red-Teaming Language Models with DSPyβ153Updated 9 months ago
- Helps you build better AI agents through debuggable unit testingβ141Updated this week
- π Datasets and models for instruction-tuningβ232Updated last year
- Stanford CRFM's initiative to assess potential compliance with the draft EU AI Actβ92Updated last year
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".β92Updated 10 months ago
- β115Updated this week
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.β104Updated 4 months ago
- Automatic Evals for Instruction-Tuned Modelsβ100Updated this week
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to aβ¦β323Updated 10 months ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.β192Updated this week
- Sample notebooks and prompts for LLM evaluationβ119Updated last month
- ReLM is a Regular Expression engine for Language Modelsβ103Updated last year
- β247Updated 6 months ago
- π€π aiFlows: The building blocks of your collaborative AIβ244Updated 8 months ago
- A curated list of awesome synthetic data tools (open source and commercial).β133Updated last year
- data cleaning and curation for unstructured textβ328Updated 5 months ago
- The Foundation Model Transparency Indexβ73Updated 7 months ago
- β39Updated 5 months ago
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Setsβ213Updated last year
- RuLES: a benchmark for evaluating rule-following in language modelsβ215Updated this week
- Mixing Language Models with Self-Verification and Meta-Verificationβ100Updated last month
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vectoβ¦β214Updated 8 months ago
- A framework to empower forecasting using Large Language Models (LLMs)β104Updated 6 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAGβ313Updated 2 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo rankerβ106Updated 3 weeks ago
- Collection of evals for Inspect AIβ45Updated this week
- Doing simple retrieval from LLM models at various context lengths to measure accuracyβ99Updated 9 months ago
- Erasing concepts from neural representations with provable guaranteesβ219Updated last month