aiverify-foundation / moonshot-data
Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)
☆24Updated last week
Alternatives and similar repositories for moonshot-data:
Users that are interested in moonshot-data are comparing it to the libraries listed below
- Moonshot - A simple and modular tool to evaluate and red-team any LLM application.☆205Updated last week
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆79Updated last year
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆75Updated this week
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Models☆17Updated 6 months ago
- AI Verify☆130Updated this week
- ☆43Updated 2 weeks ago
- Weak-to-Strong Jailbreaking on Large Language Models☆72Updated 11 months ago
- Python package for measuring memorization in LLMs.☆137Updated 2 months ago
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆88Updated 10 months ago
- Code for watermarking language models☆76Updated 4 months ago
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873☆134Updated 8 months ago
- Official repository for "PostMark: A Robust Blackbox Watermark for Large Language Models"☆19Updated 5 months ago
- ☆34Updated last year
- ☆31Updated last year
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆45Updated 3 months ago
- 🤫 Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Con…☆36Updated last year
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆66Updated 11 months ago
- Papers about red teaming LLMs and Multimodal models.☆91Updated 2 months ago
- Dataset for the Tensor Trust project☆36Updated 10 months ago
- A Synthetic Dataset for Personal Attribute Inference (NeurIPS'24 D&B)☆32Updated 2 months ago
- ☆33Updated 2 months ago
- This repository provides implementation to formalize and benchmark Prompt Injection attacks and defenses☆167Updated last week
- Official Repository for Dataset Inference for LLMs☆28Updated 6 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆65Updated 10 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆95Updated 9 months ago
- Training data extraction on GPT-2☆179Updated last year
- A collection of automated evaluators for assessing jailbreak attempts.☆102Updated this week
- ☆39Updated 5 months ago
- Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]☆61Updated last week
- This repository contains data, code and models for contextual noncompliance.☆19Updated 6 months ago