declare-lab/safety-arithmetic

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/declare-lab/safety-arithmetic)

declare-lab / safety-arithmetic

☆13

Alternatives and similar repositories for safety-arithmetic

Users that are interested in safety-arithmetic are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhu-minjun / SafetyLock
View on GitHub
Your finetuned model's back to its original safety standards faster than you can say "SafetyLock"!
☆11Oct 16, 2024Updated last year
NeuralSentinel / SafeInfer
View on GitHub
☆23Jan 14, 2025Updated last year
ResearAI / MeOS
View on GitHub
Fork yourself as a Skill, so agents understand you better.
☆20Apr 8, 2026Updated 3 months ago
declare-lab / resta
View on GitHub
Restore safety in fine-tuned language models through task arithmetic
☆33Mar 28, 2024Updated 2 years ago
declare-lab / SAT
View on GitHub
Code for the EMNLP 2022 Findings short paper "SAT: Improving Semi-Supervised Text Classification with Simple Instance-Adaptive Self-Train…
☆12Feb 25, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
abhinavkashyap / domadapter
View on GitHub
Domain Adaptation and Adapters
☆16Feb 28, 2023Updated 3 years ago
declare-lab / WikiDes
View on GitHub
A Wikipedia-based summarization dataset
☆14Mar 27, 2023Updated 3 years ago
WENGSYX / CMIVQA_Baseline
View on GitHub
CMIVQA
☆18Jun 3, 2024Updated 2 years ago
WENGSYX / ControlLM
View on GitHub
ControlLM is a method to control the personality traits and behaviors of language models in real-time at inference without costly trainin…
☆21Nov 6, 2024Updated last year
declare-lab / VIP
View on GitHub
Our EMNLP 2022 paper on VIP-Based Prompting for Parameter-Efficient Learning
☆10Oct 22, 2022Updated 3 years ago
XuandongZhao / weak-to-strong
View on GitHub
[ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models
☆90May 2, 2025Updated last year
qizhangli / Gradient-based-Jailbreak-Attacks
View on GitHub
Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Nov 7, 2024Updated last year
tianyi-lab / C3PO
View on GitHub
[COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"
☆21Apr 9, 2025Updated last year
sunlab-osu / ReasonBERT
View on GitHub
Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021
☆28Feb 1, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
declare-lab / KNOT
View on GitHub
This repository contains the implementation of the paper -- KNOT: Knowledge Distillation using Optimal Transport for Solving NLP Tasks
☆15Sep 15, 2022Updated 3 years ago
marcovito / sma
View on GitHub
☆15Aug 17, 2023Updated 2 years ago
tianyi-lab / RoMA
View on GitHub
Code for "Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs"
☆19Nov 6, 2025Updated 8 months ago
W-Wu / ERC-SLT22
View on GitHub
Code for "Distribution-based Emotion Recognition in Conversation"
☆18Feb 6, 2023Updated 3 years ago
DanielSc4 / Dynamic-Activation-Composition
View on GitHub
Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"
☆14Nov 22, 2024Updated last year
yurujiang2003 / sparta
View on GitHub
NeurIPS 2025
☆15Feb 4, 2026Updated 5 months ago
declare-lab / della
View on GitHub
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
☆37Jul 12, 2024Updated 2 years ago
zyxnlp / ICL-Interpretation-Analysis-Resources
View on GitHub
Links to publications that focus on the interpretation and analysis of in-context learning
☆14Oct 17, 2024Updated last year
wangcunxiang / Can-PLM-Serve-as-KB-for-CBQA
View on GitHub
The code and data for ACL2021 paper <Can Generative Pre-trained Language Models Serve as Knowledge Bases for Closed-book QA?>
☆22Dec 18, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
open-compass / RePro
View on GitHub
[ICLR 2026] Rectifying LLM Thought From Lens of Optimization
☆15Dec 5, 2025Updated 7 months ago
yuhanghe01 / RiTTA
View on GitHub
Event Relation in Text-to-Audio (TTA) Generation
☆21Feb 26, 2025Updated last year
declare-lab / M2H2-dataset
View on GitHub
This repository contains the dataset and baselines explained in the paper: M2H2: A Multimodal Multiparty Hindi Dataset For HumorRecogniti…
☆19Mar 14, 2023Updated 3 years ago
abhinavkashyap / dct
View on GitHub
Repository for the ACL'22 paper "So Different Yet So Alike! Constrained Unsupervised Text Style Transfer"
☆16Jan 19, 2024Updated 2 years ago
declare-lab / ferret
View on GitHub
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
☆19Aug 22, 2024Updated last year
susumuota / nano-askllm
View on GitHub
Unofficial implementation of the Ask-LLM paper 'How to Train Data-Efficient LLMs', arXiv:2402.09668.
☆12Jun 19, 2024Updated 2 years ago
shadowkiller33 / Language_attack
View on GitHub
A repo for LLM jailbreak
☆14Sep 5, 2023Updated 2 years ago
youngwanLEE / holisafe
View on GitHub
[CVPR Findings 2026] HoliSafe: Holistic Safety Benchmarking and Modeling for Vision-Language Model
☆17Mar 8, 2026Updated 4 months ago
amayuelas / multi-agent-attack
View on GitHub
MutliAgent Attack
☆15Oct 3, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
oshears / adv-ml-2020-snn-project
View on GitHub
Advanced Machine Learning Fall 2020 Project Repository
☆12Dec 12, 2020Updated 5 years ago
rain152 / PAT
View on GitHub
[NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning
☆11Oct 29, 2024Updated last year
DYR1 / MoGU
View on GitHub
Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.
☆18Jan 14, 2025Updated last year
declare-lab / DoubleMix
View on GitHub
Code for the COLING 2022 paper "DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification"
☆19Oct 19, 2022Updated 3 years ago
zubayerhimel / kanban-with-tailwindcss
View on GitHub
Kanban board made with TailwindCSS
☆11Jun 10, 2021Updated 5 years ago
Re-Align / AlignTDS
View on GitHub
Analyzing LLM Alignment via Token distribution shift
☆17Jan 26, 2024Updated 2 years ago
U-C4N / Deepseek-CoT
View on GitHub
Deepseek-CoT
☆10Oct 6, 2024Updated last year