aiverify-foundation / moonshot-dataLinks
Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)
☆39Updated 2 weeks ago
Alternatives and similar repositories for moonshot-data
Users that are interested in moonshot-data are comparing it to the libraries listed below
Sorting:
- Moonshot - A simple and modular tool to evaluate and red-team any LLM application.☆306Updated this week
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆117Updated this week
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆53Updated last year
- Code for the paper "Fishing for Magikarp"☆179Updated 8 months ago
- ☆39Updated 2 years ago
- An open-source compliance-centered evaluation framework for Generative AI models☆178Updated last month
- The Granite Guardian models are designed to detect risks in prompts and responses.☆128Updated 3 months ago
- ☆43Updated last year
- AI Verify☆46Updated 2 weeks ago
- ☆34Updated last year
- Red-Teaming Language Models with DSPy☆250Updated 11 months ago
- [ICLR 2025] 🚀 CodeMMLU Evaluator: A framework for evaluating LM models on CodeMMLU MCQs benchmark.☆29Updated 9 months ago
- LLM Attributor: Attribute LLM's Generated Text to Training Data☆70Updated 4 months ago
- Papers about red teaming LLMs and Multimodal models.☆159Updated 8 months ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆104Updated last year
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs☆310Updated last year
- Reward Model framework for LLM RLHF☆62Updated 2 years ago
- ☆50Updated last year
- Evaluating LLMs with fewer examples☆169Updated last year
- ☆50Updated last year
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Updated last year
- autoredteam: code for training models that automatically red team other language models☆15Updated 2 years ago
- A simple evaluation of generative language models and safety classifiers.☆85Updated last month
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆184Updated 10 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆174Updated last week
- A curated list of materials on AI guardrails☆45Updated 7 months ago
- Benchmarking Large Language Models☆105Updated 7 months ago
- The code and data for "Are Large Pre-Trained Language Models Leaking Your Personal Information?" (Findings of EMNLP '22)☆27Updated 3 years ago
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆79Updated 5 months ago
- Collection of evals for Inspect AI☆349Updated this week