aiverify-foundation / moonshot-dataLinks
Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)
☆36Updated this week
Alternatives and similar repositories for moonshot-data
Users that are interested in moonshot-data are comparing it to the libraries listed below
Sorting:
- ☆34Updated 8 months ago
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆97Updated this week
- ☆41Updated last year
- Code for the paper "Fishing for Magikarp"☆162Updated 2 months ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆91Updated last month
- Red-Teaming Language Models with DSPy☆203Updated 5 months ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Updated 11 months ago
- Improving Alignment and Robustness with Circuit Breakers☆225Updated 10 months ago
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆44Updated 10 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Updated last year
- Moonshot - A simple and modular tool to evaluate and red-team any LLM application.☆262Updated this week
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆165Updated 4 months ago
- A Comprehensive Assessment of Trustworthiness in GPT Models☆299Updated 10 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆132Updated 2 months ago
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆62Updated 4 months ago
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs☆270Updated last year
- FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists☆29Updated 4 months ago
- Papers about red teaming LLMs and Multimodal models.☆131Updated 2 months ago
- ☆45Updated last year
- ☆216Updated 4 years ago
- Collection of evals for Inspect AI☆198Updated this week
- Evaluating LLMs with fewer examples☆160Updated last year
- Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]☆72Updated 6 months ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆87Updated 8 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆324Updated 6 months ago
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆204Updated this week
- A method for steering llms to better follow instructions☆48Updated 3 weeks ago
- NeurIPS'24 - LLM Safety Landscape☆25Updated 5 months ago
- ☆36Updated 2 years ago
- A simple evaluation of generative language models and safety classifiers.☆58Updated last year