Giskard-AI / awesome-ai-safetyLinks
π A curated list of papers & technical articles on AI Quality & Safety
β191Updated 4 months ago
Alternatives and similar repositories for awesome-ai-safety
Users that are interested in awesome-ai-safety are comparing it to the libraries listed below
Sorting:
- Fiddler Auditor is a tool to evaluate language models.β187Updated last year
- β267Updated 7 months ago
- Red-Teaming Language Models with DSPyβ212Updated 6 months ago
- An open-source compliance-centered evaluation framework for Generative AI modelsβ161Updated this week
- Stanford CRFM's initiative to assess potential compliance with the draft EU AI Actβ94Updated last year
- π Datasets and models for instruction-tuningβ238Updated last year
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.β114Updated last month
- π A curated list of resources dedicated to synthetic dataβ132Updated 3 years ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo rankerβ114Updated this week
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to aβ¦β406Updated last year
- data cleaning and curation for unstructured textβ328Updated last year
- β247Updated 5 months ago
- A tool for evaluating LLMsβ424Updated last year
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuningβ46Updated last year
- Mixing Language Models with Self-Verification and Meta-Verificationβ105Updated 8 months ago
- Curation of prompts that are known to be adversarial to large language modelsβ185Updated 2 years ago
- Large Language Model (LLM) Inference API and Chatbotβ126Updated last year
- AI Verifyβ28Updated this week
- The Foundation Model Transparency Indexβ83Updated last year
- β337Updated last year
- A repository of Language Model Vulnerabilities and Exposures (LVEs).β113Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β49Updated last year
- β42Updated last year
- AI Data Management & Evaluation Platformβ216Updated last year
- Toolkit for attaching, training, saving and loading of new heads for transformer modelsβ285Updated 5 months ago
- π¦ Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data β¦β206Updated this week
- Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)β398Updated last year
- Automatically evaluate your LLMs in Google Colabβ656Updated last year
- Moonshot - A simple and modular tool to evaluate and red-team any LLM application.β266Updated 3 weeks ago
- A curated list of awesome publications and researchers on prompting framework updated and maintained by The Intelligent System Security (β¦β84Updated 7 months ago