uiuc-focal-lab / LLMCert-BLinks

A certifier for bias in LLMs

☆24

Alternatives and similar repositories for LLMCert-B

Users that are interested in LLMCert-B are comparing it to the libraries listed below

Sorting:

purpcode-uiuc / purpcode
🔮Reasoning for Safer Code Generation; 🥇Winner Solution of Amazon Nova AI Challenge 2025
☆28Updated 2 months ago
facebookresearch / SecAlign
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆72Updated 3 months ago
ejones313 / auditing-llms
☆58Updated 2 years ago
ise-uiuc / xft
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
☆35Updated last year
paul-rottger / xstest
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆116Updated 8 months ago
rishub-tamirisa / tamper-resistance
[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"
☆62Updated 4 months ago
tml-epfl / os-harm
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents [NeurIPS 2025 Spotlight]
☆38Updated last month
centerforaisafety / tdc2023-starter-kit
This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.
☆89Updated last year
amazon-science / recode
Releasing code for "ReCode: Robustness Evaluation of Code Generation Models"
☆52Updated last year
PurCL / ProSec
Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"
☆15Updated 7 months ago
milesaturpin / cot-unfaithfulness
☆49Updated 2 years ago
lt-asset / REPOCOD
For our ACL25 Paper: Can Language Models Replace Programmers? RepoCod Says ‘Not Yet’ - by Shanchao Liang and Yiran Hu and Nan Jiang and L…
☆22Updated 2 months ago
IBM / activation-steering
[ICLR 2025] General-purpose activation steering library
☆114Updated last month
SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆99Updated last year
SolidShen / RIPPLE_official
☆20Updated last year
facebookresearch / advprompter
Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873
☆169Updated last year
facebookresearch / cruxeval
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆154Updated last year
eth-sri / SafeCoder
☆48Updated last year
YihongDong / CDD-TED4LLMs
☆15Updated 11 months ago
GraySwanAI / circuit-breakers
Improving Alignment and Robustness with Circuit Breakers
☆238Updated last year
thunlp / DebugBench
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
☆83Updated last year
ryoungj / ToolEmu
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
☆167Updated last year
FudanSELab / ClassEval
Benchmark ClassEval for class-level code generation.
☆145Updated last year
floatai / HumanEval-XL
[LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
☆38Updated 7 months ago
chrisliu298 / awesome-representation-engineering
A resource repository for representation engineering in large language models
☆139Updated 11 months ago
reddy-lab-code-research / PPOCoder
Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"
☆117Updated last year
Unispac / shallow-vs-deep-alignment
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆160Updated 6 months ago
AI-secure / RedCode
[NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents
☆52Updated 3 months ago
iamgroot42 / mimir
Python package for measuring memorization in LLMs.
☆172Updated 3 months ago
evo-eval / evoeval
EvoEval: Evolving Coding Benchmarks via LLM
☆76Updated last year