uiuc-focal-lab / LLMCert-B
A certifier for bias in LLMs
☆24Updated last month
Alternatives and similar repositories for LLMCert-B
Users that are interested in LLMCert-B are comparing it to the libraries listed below
Sorting:
- Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)☆53Updated 2 months ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆137Updated 7 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆119Updated 3 weeks ago
- SatLM: SATisfiability-Aided Language Models using Declarative Prompting (NeurIPS 2023)☆48Updated 9 months ago
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆55Updated 2 months ago
- ☆67Updated last year
- [NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents☆36Updated 2 weeks ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Updated 9 months ago
- Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"☆14Updated last month
- XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts☆31Updated 10 months ago
- Efficient and general syntactical decoding for Large Language Models☆265Updated this week
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆74Updated 10 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆82Updated last year
- The is the official implementation of "Lyra: Orchestrating Dual Correction in Automated Theorem Proving"☆16Updated 10 months ago
- ☆23Updated 7 months ago
- ☆54Updated 2 years ago
- Official Repository for Dataset Inference for LLMs☆33Updated 9 months ago
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Updated 2 months ago
- ☆14Updated 11 months ago
- ☆20Updated last year
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆93Updated 11 months ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆76Updated last month
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆72Updated 2 months ago
- [ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion☆39Updated 6 months ago
- This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.☆86Updated 11 months ago
- ☆36Updated 7 months ago
- The Paper List on Data Contamination for Large Language Models Evaluation.☆93Updated last month
- A Synthetic Dataset for Personal Attribute Inference (NeurIPS'24 D&B)☆40Updated 5 months ago
- Making code edting up to 7.7x faster using multi-layer speculation☆20Updated 2 months ago
- The CodeInsight dataset is designed for code generation tasks, providing developers with expert-curated examples that bridge the gap betw…☆12Updated 6 months ago