aiverify-foundation / moonshot-dataLinks
Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)
☆39Updated this week
Alternatives and similar repositories for moonshot-data
Users that are interested in moonshot-data are comparing it to the libraries listed below
Sorting:
- Code for the paper "Fishing for Magikarp"☆179Updated 8 months ago
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆54Updated last year
- ☆34Updated last year
- ☆43Updated last year
- The Granite Guardian models are designed to detect risks in prompts and responses.☆130Updated 4 months ago
- AuditNLG: Auditing Generative AI Language Modeling for Trustworthiness☆103Updated last year
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆117Updated last week
- [ICLR 2025] 🚀 CodeMMLU Evaluator: A framework for evaluating LM models on CodeMMLU MCQs benchmark.☆29Updated 9 months ago
- ☆49Updated 10 months ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Updated last year
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Updated last year
- Red-Teaming Language Models with DSPy☆250Updated 11 months ago
- Moonshot - A simple and modular tool to evaluate and red-team any LLM application.☆308Updated last week
- Improving Alignment and Robustness with Circuit Breakers☆258Updated last year
- ☆50Updated last year
- ☆65Updated last week
- codebase release for EMNLP2023 paper publication☆19Updated 4 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆94Updated last year
- Evaluating LLMs with fewer examples☆169Updated last year
- Reward Model framework for LLM RLHF☆62Updated 2 years ago
- LLM Attributor: Attribute LLM's Generated Text to Training Data☆72Updated 4 months ago
- FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists☆31Updated 5 months ago
- ☆91Updated last month
- Collection of evals for Inspect AI☆357Updated this week
- Open Implementations of LLM Analyses☆107Updated last year
- Papers about red teaming LLMs and Multimodal models.☆160Updated 8 months ago
- ☆44Updated last year
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆184Updated 10 months ago
- A simple evaluation of generative language models and safety classifiers.☆85Updated last month
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆43Updated last year