Your finetuned model's back to its original safety standards faster than you can say "SafetyLock"!
☆11Oct 16, 2024Updated last year
Alternatives and similar repositories for SafetyLock
Users that are interested in SafetyLock are comparing it to the libraries listed below
Sorting:
- Personality Alignment of Language Models☆53Jul 1, 2025Updated 8 months ago
- ☆24Jun 17, 2025Updated 8 months ago
- Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting☆20Mar 25, 2024Updated last year
- The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.☆27Nov 4, 2024Updated last year
- [ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…☆28Sep 25, 2024Updated last year
- ☆34Feb 6, 2026Updated 3 weeks ago
- List your bounties, top contributors, org stats & more☆31Nov 4, 2024Updated last year
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆90Jan 29, 2024Updated 2 years ago
- Code for running experiments and benchmarking on GNNExplainer: Generating Explanations for Graph Neural Networks☆15May 8, 2021Updated 4 years ago
- NAACL 2022 paper on Analyzing Modality Robustness in Multimodal Sentiment Analysis☆31Jan 21, 2023Updated 3 years ago
- Prompt & model versioning on the cloud☆10Jun 22, 2024Updated last year
- ☆11Nov 8, 2023Updated 2 years ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- ☆38Oct 2, 2024Updated last year
- ☆39May 21, 2024Updated last year
- The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.☆13Dec 16, 2024Updated last year
- now-defunct fork of three20 -- please see facebook/three20 for most/all purposes☆17Aug 20, 2010Updated 15 years ago
- [ICML 2023] Protecting Language Generation Models via Invisible Watermarking☆13Sep 8, 2023Updated 2 years ago
- [WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews☆17Dec 14, 2025Updated 2 months ago
- Apps for Mechanix OS☆12Jan 26, 2024Updated 2 years ago
- Minimalist version of probml/rebayes☆10Sep 15, 2025Updated 5 months ago
- Code associated with ICML (2024). "Defense against Backdoor Attack on Pre-trained Language Models via Head Pruning and Attention Normaliz…☆10Feb 22, 2026Updated last week
- Mac OS OSAKit adapted for Rust☆15Mar 1, 2025Updated last year
- Code and Data for EMNLP 2023 Paper "MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Langu…☆14Apr 7, 2025Updated 10 months ago
- ☆10Mar 18, 2023Updated 2 years ago
- Compendium of all the important OS concepts and key points. https://applied-programming.github.io/Operating-Systems-Notes/☆11Aug 13, 2017Updated 8 years ago
- Library for computing the Finite-time Lyapunov Exponents of 2D flows using xarray☆10Apr 25, 2022Updated 3 years ago
- On-the-fly Table Generation - SIGIR'18☆10Feb 1, 2020Updated 6 years ago
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers☆43Feb 12, 2025Updated last year
- A toolkit for testing and improving named entity recognition [ESEC/FSE'23]☆11Aug 31, 2023Updated 2 years ago
- ☆16Jan 23, 2026Updated last month
- ☆11Nov 8, 2022Updated 3 years ago
- Welcome to the Sec+ 701 Study Guide repository! This collection provides materials for the CompTIA Security+ (SY0-701) exam, including at…☆16Nov 30, 2024Updated last year
- DOOM port to raylib☆14Jul 13, 2020Updated 5 years ago
- Flux reconstruction fluid flow solver for 1D PDEs written in Julia. Linear advection, Burgers, viscous Burgers, and Euler equations.☆13Apr 28, 2022Updated 3 years ago
- embedded Perl 5 interpreter in Haskell, forked from https://github.com/perl6/Pugs.hs. Candidate package on hackage at https://hackage.has…☆12Feb 7, 2021Updated 5 years ago
- ☆13Jul 12, 2024Updated last year
- Python implementation of Gnutella for CS 114 P2P systems☆12Mar 23, 2012Updated 13 years ago
- Code for COLING 2022 paper "FactMix: Using a Few Labeled In-domain Examples to Generalize to Cross-domain Named Entity Recognition"☆15Jan 15, 2023Updated 3 years ago