aiverify-foundation / moonshot-dataLinks
Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)
☆37Updated last month
Alternatives and similar repositories for moonshot-data
Users that are interested in moonshot-data are comparing it to the libraries listed below
Sorting:
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆107Updated this week
- ☆35Updated 11 months ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆119Updated last week
- ☆43Updated last year
- A simple evaluation of generative language models and safety classifiers.☆69Updated this week
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆46Updated last year
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆145Updated 4 months ago
- Code for the paper "Fishing for Magikarp"☆170Updated 5 months ago
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆211Updated this week
- ☆38Updated 2 years ago
- Moonshot - A simple and modular tool to evaluate and red-team any LLM application.☆278Updated last month
- ☆39Updated last year
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs☆294Updated last year
- AI Verify☆35Updated last week
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆85Updated last year
- Papers about red teaming LLMs and Multimodal models.☆144Updated 4 months ago
- Red-Teaming Language Models with DSPy☆219Updated 8 months ago
- AuditNLG: Auditing Generative AI Language Modeling for Trustworthiness☆101Updated 8 months ago
- A re-implementation of the "Extracting Training Data from Large Language Models" paper by Carlini et al., 2020☆36Updated 3 years ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Updated last year
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆171Updated 6 months ago
- Collection of evals for Inspect AI☆254Updated this week
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆91Updated 10 months ago
- ☆47Updated last year
- A curated list of materials on AI guardails☆40Updated 4 months ago
- Improving Alignment and Robustness with Circuit Breakers☆238Updated last year
- A Comprehensive Assessment of Trustworthiness in GPT Models☆303Updated last year
- LLM Attributor: Attribute LLM's Generated Text to Training Data☆63Updated last month
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆71Updated 2 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆107Updated 2 years ago