Giskard-AI / awesome-ai-safetyLinks
π A curated list of papers & technical articles on AI Quality & Safety
β188Updated 3 months ago
Alternatives and similar repositories for awesome-ai-safety
Users that are interested in awesome-ai-safety are comparing it to the libraries listed below
Sorting:
- Fiddler Auditor is a tool to evaluate language models.β184Updated last year
- β267Updated 6 months ago
- The Foundation Model Transparency Indexβ82Updated last year
- An open-source compliance-centered evaluation framework for Generative AI modelsβ159Updated this week
- Red-Teaming Language Models with DSPyβ203Updated 5 months ago
- π Datasets and models for instruction-tuningβ238Updated last year
- Stanford CRFM's initiative to assess potential compliance with the draft EU AI Actβ94Updated last year
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various useβ¦β130Updated this week
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo rankerβ114Updated 3 weeks ago
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.β113Updated last week
- π A curated list of resources dedicated to synthetic dataβ132Updated 3 years ago
- AI Verifyβ27Updated this week
- β41Updated last year
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.β168Updated last year
- π¦ Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data β¦β206Updated this week
- β244Updated 4 months ago
- Mixing Language Models with Self-Verification and Meta-Verificationβ105Updated 7 months ago
- Creating the tools and data sets necessary to evaluate vulnerabilities in LLMs.β25Updated 4 months ago
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to aβ¦β401Updated last year
- Google Deepmind's PromptBreeder for automated prompt engineering implemented in langchain expression language.β132Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β49Updated last year
- A tool for evaluating LLMsβ424Updated last year
- Continuous Integration for LLM powered applicationsβ248Updated last year
- π€ Disaggregators: Curated data labelers for in-depth analysis.β66Updated 2 years ago
- Moonshot - A simple and modular tool to evaluate and red-team any LLM application.β262Updated this week
- A curated list of awesome synthetic data tools (open source and commercial).β197Updated last year
- A joint community effort to create one central leaderboard for LLMs.β304Updated 11 months ago
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.β99Updated this week
- AI Data Management & Evaluation Platformβ215Updated last year
- β95Updated last year