Giskard-AI / awesome-ai-safetyLinks
π A curated list of papers & technical articles on AI Quality & Safety
β182Updated last month
Alternatives and similar repositories for awesome-ai-safety
Users that are interested in awesome-ai-safety are comparing it to the libraries listed below
Sorting:
- Fiddler Auditor is a tool to evaluate language models.β181Updated last year
- An open-source compliance-centered evaluation framework for Generative AI modelsβ152Updated 3 weeks ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β49Updated 10 months ago
- Creating the tools and data sets necessary to evaluate vulnerabilities in LLMs.β23Updated 2 months ago
- Stanford CRFM's initiative to assess potential compliance with the draft EU AI Actβ94Updated last year
- π Datasets and models for instruction-tuningβ238Updated last year
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various useβ¦β118Updated 3 weeks ago
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.β110Updated 8 months ago
- β234Updated 2 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vectoβ¦β239Updated 3 months ago
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.β92Updated this week
- Domain Adapted Language Modeling Toolkit - E2E RAGβ322Updated 6 months ago
- Red-Teaming Language Models with DSPyβ195Updated 3 months ago
- β266Updated 4 months ago
- data cleaning and curation for unstructured textβ327Updated 10 months ago
- The Foundation Model Transparency Indexβ79Updated last year
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β172Updated 8 months ago
- Open Implementations of LLM Analysesβ103Updated 7 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.β423Updated last year
- Erasing concepts from neural representations with provable guaranteesβ228Updated 4 months ago
- Automatically evaluate your LLMs in Google Colabβ631Updated last year
- Sample notebooks and prompts for LLM evaluationβ131Updated this week
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Modelsβ526Updated 11 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"β108Updated last year
- πΈ Open-Source Evaluation & Testing for Computer Vision AI systemsβ28Updated 7 months ago
- TitanML Takeoff Server is an optimization, compression and deployment platform that makes state of the art machine learning models accessβ¦β114Updated last year
- Keeping language models honest by directly eliciting knowledge encoded in their activations.β205Updated this week
- Fast & more realistic evaluation of chat language models. Includes leaderboard.β187Updated last year
- Continuous Integration for LLM powered applicationsβ241Updated last year
- awesome synthetic (text) datasetsβ281Updated 7 months ago