A simple evaluation of generative language models and safety classifiers.
☆99Apr 15, 2026Updated 3 weeks ago
Alternatives and similar repositories for safety-eval
Users that are interested in safety-eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An official codebase for "NormLens: Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Comm…☆10May 9, 2024Updated last year
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆121Dec 2, 2024Updated last year
- This repository contains code for the paper "Meet Your Favorite Character: Open-domain Chatbot Mimicking Fictional Characters with only a…☆13Jun 11, 2022Updated 3 years ago
- Corpus to accompany: "Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding"☆11Apr 11, 2025Updated last year
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆25Nov 29, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- Official Repo for the Paper "AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution o…☆26Jan 12, 2025Updated last year
- Official code for FAccT'21 paper "Fairness Through Robustness: Investigating Robustness Disparity in Deep Learning" https://arxiv.org/abs…☆13Mar 9, 2021Updated 5 years ago
- An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"☆52Aug 13, 2023Updated 2 years ago
- Code, data, models for the Sherlock corpus☆61Nov 11, 2022Updated 3 years ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆175Mar 8, 2025Updated last year
- Code for the paper "Pretrained Models for Multilingual Federated Learning" at NAACL 2022☆11Aug 9, 2022Updated 3 years ago
- ☆40May 17, 2025Updated 11 months ago
- [JMLR] Gradual Domain Adaptation: Theory and Algorithms☆11Jan 14, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Official repository for "DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation (ACL2023 Findings)"☆11May 23, 2023Updated 2 years ago
- Source code for paper "PRiSM: Enhancing Low-Resource Document-Level Relation Extraction with Relation-Aware Score Calibration", Findings …☆11Jun 20, 2025Updated 10 months ago
- [EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156☆49Dec 14, 2023Updated 2 years ago
- EMNLP 2020: Personalized Dialog Generation with Commonsense☆18Oct 12, 2022Updated 3 years ago
- Official implementation of BPA (CVPR 2022)☆13Jun 17, 2022Updated 3 years ago
- ☆40Dec 19, 2024Updated last year
- AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies☆30Aug 14, 2024Updated last year
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆90Mar 30, 2025Updated last year
- Repository containing dataset, models and code associated with the CHIME project☆17Aug 22, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [NeurIPS 2025] MergeBench: A Benchmark for Merging Domain-Specialized LLMs☆46Feb 11, 2026Updated 2 months ago
- Official repository for "Reweighting Strategy based on Synthetic Data Identification for Sentence Similarity (COLING2022)"☆18Sep 4, 2022Updated 3 years ago
- [MICCAI'21] Personalized Retrogress-Resilient Framework for Real-World Medical Federated Learning☆17Mar 23, 2022Updated 4 years ago
- ☆12Apr 24, 2024Updated 2 years ago
- ☆29Feb 24, 2025Updated last year
- Code for Representation Bending Paper☆17Jul 15, 2025Updated 9 months ago
- Official implementation of OpenTab (ICLR2024)☆13Mar 27, 2024Updated 2 years ago
- ☆13Jun 4, 2024Updated last year
- Kubernetes cli (kubectl) powered by GPT☆15Apr 20, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆96Apr 14, 2026Updated 3 weeks ago
- Source code for Paper "Legal Feature Enhanced Semantic Matching Network for Similar Case Matching".☆15Feb 17, 2020Updated 6 years ago
- Twitter Clone Coding with Firebase, Typescript, React Router and Styled Component v.2023☆11Aug 16, 2023Updated 2 years ago
- A minimalist Twitter-style HUMAN-FREE social media with AI bots.☆12Jul 29, 2024Updated last year
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆283Jul 28, 2025Updated 9 months ago
- Code release for "Understanding Bias in Large-Scale Visual Datasets"☆23Dec 4, 2024Updated last year
- NSMC, KorSTS ... fine-tunings☆18Feb 23, 2022Updated 4 years ago