justincui03/or-bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/justincui03/or-bench)

justincui03 / or-bench

[ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"

☆28

Alternatives and similar repositories for or-bench

Users that are interested in or-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

BatsResearch / cross-lingual-detox
View on GitHub
Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024
☆18Mar 25, 2025Updated last year
dtch1997 / steering-bench
View on GitHub
Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"
☆22Dec 14, 2024Updated last year
y0mingzhang / diffuse-distributions
View on GitHub
Forcing Diffuse Distributions out of Language Models
☆18Sep 10, 2024Updated last year
MurrayTom / SG-Bench
View on GitHub
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
☆26Nov 29, 2024Updated last year
songys / 2021Langcon
View on GitHub
☆11Oct 3, 2021Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ethz-spylab / superhuman-ai-consistency
View on GitHub
☆30Jun 19, 2023Updated 3 years ago
jaeho-lee / oce
View on GitHub
Codes for "Learning bounds for risk-sensitive learning," NeurIPS 2020 (or see arXiv 2006.08138)
☆11Oct 15, 2020Updated 5 years ago
shaoshuo-ss / Awesome-LLM-Fingerprinting
View on GitHub
Paper list of LLM fingerprinting, based on our paper titled "SoK: Large Language Model Copyright Auditing via Fingerprinting".
☆29Aug 28, 2025Updated 10 months ago
RYC-98 / FPR
View on GitHub
Official codes for FPR (Accepted by CVPR2025)
☆15Mar 19, 2025Updated last year
ml-postech / multi-armed-bandit-algorithm-against-strategic-replication
View on GitHub
Official implementation of "Multi-armed Bandit Algorithm against Strategic Replication"
☆14May 17, 2022Updated 4 years ago
Pi3AI / DreamGym
View on GitHub
This is AI implementation (not official) of the DreamGym framework from the paper "Scaling Agent Learning via Experience Synthesis" (arXi…
☆44Nov 9, 2025Updated 8 months ago
iwhwang / SelecMix
View on GitHub
SelecMix: Debiased Learning by Contradicting-pair Sampling (NeurIPS 2022)
☆13Jun 5, 2024Updated 2 years ago
allenai / wildguard
View on GitHub
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
☆131Dec 2, 2024Updated last year
jwf40 / Information-Theoretic-Unlearning
View on GitHub
Code for the paper 'An Information Theoretic Approach to Machine Unlearning'
☆24Mar 20, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
horizon-llm / Think-RM
View on GitHub
[NeurIPS 2025] Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
☆17Nov 2, 2025Updated 8 months ago
YichenZW / Robust-Det
View on GitHub
The code implementation of the paper Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks (A…
☆13Jul 16, 2024Updated 2 years ago
salman-lui / x-teaming
View on GitHub
☆68May 21, 2025Updated last year
NickyFot / ACMMM22_LearningLabelRelationships
View on GitHub
☆11Jun 20, 2023Updated 3 years ago
utrerf / robust_transfer_learning
View on GitHub
Accelerating Transfer Learning with Robust Neural Nets
☆11Oct 2, 2020Updated 5 years ago
xiaosen-wang / TA
View on GitHub
Triangle Attack: A Query-efficient Decision-based Adversarial Attack (ECCV 2022)
☆16Jul 19, 2022Updated 4 years ago
ml-postech / robust-deep-learning-from-crowds-with-belief-propagation
View on GitHub
Official PyTorch implementation of "Robust Deep Learning from Crowds with Belief Propagation"
☆19Mar 22, 2022Updated 4 years ago
YichenZW / Pacing
View on GitHub
This repository includes the code implementation of the paper Improving Pacing in Long-Form Story Planning by Yichen Wang, Kevin Yang, Xi…
☆19Nov 19, 2024Updated last year
microsoft / RTP-LX
View on GitHub
Repository for the paper "RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?"
☆29May 1, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
ruchikachavhan / concept-prune
View on GitHub
Code for the paper - ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
☆24Aug 13, 2024Updated last year
eth-lre / LLM_ICL
View on GitHub
ACL24
☆11Jun 7, 2024Updated 2 years ago
drimpossible / corrective-unlearning-bench
View on GitHub
☆24Dec 17, 2025Updated 7 months ago
fjzzq2002 / random_transformers
View on GitHub
Official code for "Algorithmic Capabilities of Random Transformers" (NeurIPS 2024)
☆15Sep 28, 2024Updated last year
Gao-zy26 / ReToMe-VA
View on GitHub
[ACM MM 2024] ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack
☆14Dec 20, 2024Updated last year
YichenZW / awesome-llm-diversity
View on GitHub
A curated collection of research papers exploring diversity in Large Language Model text generation. This repository tracks cutting-edge …
☆15Jun 19, 2026Updated last month
KempnerInstitute / llm_uncertainty
View on GitHub
Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"
☆11Updated this week
TrustAI-laboratory / Many-Shot-Jailbreaking-Demo
View on GitHub
Research on "Many-Shot Jailbreaking" in Large Language Models (LLMs). It unveils a novel technique capable of bypassing the safety mechan…
☆17Aug 6, 2024Updated last year
leileqiTHU / Attacker
View on GitHub
The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1
☆13Apr 23, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
epfml / pam
View on GitHub
☆16Dec 9, 2023Updated 2 years ago
aditeyabaral / pydictionary
View on GitHub
PyDictionary is an offline English dictionary made using Python along with the Wordnet Lexical Database and Enchant Spell Dictionary. The…
☆20May 16, 2021Updated 5 years ago
osiriszjq / RobustPPE
View on GitHub
Robust Point Cloud Processing through Positional Embedding
☆14Sep 7, 2023Updated 2 years ago
xiaosen-wang / SIT
View on GitHub
[ICCV 2023] Structure Invariant Transformation for better Adversarial Transferability
☆24Feb 23, 2024Updated 2 years ago
listen0425 / Safety-Layers
View on GitHub
code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)
☆25Apr 26, 2025Updated last year
lliu606 / COSMOS
View on GitHub
☆20Feb 2, 2026Updated 5 months ago
CassiniHuy / image-low-pass-filters-pytorch
View on GitHub
low-pass filtering for image implemented by pytorch, including ideal, butterworth and gaussian filters.
☆22May 19, 2025Updated last year