declare-lab/resta

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/declare-lab/resta)

declare-lab / resta

Restore safety in fine-tuned language models through task arithmetic

☆33

Alternatives and similar repositories for resta

Users that are interested in resta are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

declare-lab / VIP
View on GitHub
Our EMNLP 2022 paper on VIP-Based Prompting for Parameter-Efficient Learning
☆10Oct 22, 2022Updated 3 years ago
declare-lab / identifiable-transformers
View on GitHub
☆22Mar 16, 2023Updated 3 years ago
declare-lab / KNOT
View on GitHub
This repository contains the implementation of the paper -- KNOT: Knowledge Distillation using Optimal Transport for Solving NLP Tasks
☆15Sep 15, 2022Updated 3 years ago
declare-lab / SAT
View on GitHub
Code for the EMNLP 2022 Findings short paper "SAT: Improving Semi-Supervised Text Classification with Simple Instance-Adaptive Self-Train…
☆12Feb 25, 2023Updated 3 years ago
declare-lab / red-instruct
View on GitHub
Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
☆111Mar 8, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
tanganke / subspace_fusion
View on GitHub
Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"
☆14Mar 28, 2024Updated 2 years ago
declare-lab / DoubleMix
View on GitHub
Code for the COLING 2022 paper "DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification"
☆19Oct 19, 2022Updated 3 years ago
declare-lab / ferret
View on GitHub
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
☆19Aug 22, 2024Updated last year
declare-lab / safety-arithmetic
View on GitHub
☆13Jan 14, 2025Updated last year
declare-lab / speech-adapters
View on GitHub
Codes and datasets for our ICASSP2023 paper, Evaluating parameter-efficient transfer learning approaches on SURE benchmark for speech und…
☆43Mar 12, 2023Updated 3 years ago
declare-lab / HyperTTS
View on GitHub
☆40Apr 15, 2024Updated 2 years ago
LLLeoLi / LARF
View on GitHub
[EMNLP 2025] Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
☆15Jul 22, 2025Updated 11 months ago
walledai / walledeval
View on GitHub
Test LLMs against jailbreaks and unprecedented harms
☆40Oct 19, 2024Updated last year
Confirm-Solutions / flrt
View on GitHub
Fluent student-teacher redteaming
☆23Jul 25, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
jqnap / Twitter-Occupation-Prediction
View on GitHub
Code and data accompanying paper "Twitter Homophily: Network Based Prediction of User’s Occupation"
☆19Jul 23, 2020Updated 5 years ago
IBM / SafeLoRA
View on GitHub
Github repo for NeurIPS 2024 paper "Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models"
☆29Dec 21, 2025Updated 6 months ago
git-disl / Safety-Tax
View on GitHub
This is the official code for the paper "Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable".
☆35Mar 11, 2025Updated last year
declare-lab / sentence-ordering
View on GitHub
This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing a…
☆28Mar 15, 2023Updated 3 years ago
declare-lab / della
View on GitHub
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
☆37Jul 12, 2024Updated 2 years ago
declare-lab / exemplary-empathy
View on GitHub
This repository contains the source codes of the paper -- Exemplars-guided Empathetic Response Generation Controlled by the Elements of H…
☆25Feb 1, 2023Updated 3 years ago
allanchen95 / IJCAI-21-WhoIsWho-baseline
View on GitHub
☆13Jul 15, 2021Updated 5 years ago
jiahaolu97 / anything-unsegmentable
View on GitHub
(CVPR 2024) "Unsegment Anything by Simulating Deformation"
☆29May 27, 2024Updated 2 years ago
Jayfeather1024 / Backdoor-Enhanced-Alignment
View on GitHub
☆24Dec 8, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ecmonsen / gendered_words
View on GitHub
Dictionary of English words tagged with their natural gender.
☆13Sep 7, 2021Updated 4 years ago
Aatlantise / syntactic-augmentation-nli
View on GitHub
Create augmentation examples from MultiNLI by subject-object inversion and passivizing.
☆17Feb 22, 2021Updated 5 years ago
MurrayTom / SG-Bench
View on GitHub
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
☆26Nov 29, 2024Updated last year
declare-lab / dialogue-understanding
View on GitHub
This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empiric…
☆128Mar 14, 2023Updated 3 years ago
git-disl / Booster
View on GitHub
This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…
☆40Mar 22, 2025Updated last year
abhinavkashyap / domadapter
View on GitHub
Domain Adaptation and Adapters
☆16Feb 28, 2023Updated 3 years ago
eujhwang / personalized-llms
View on GitHub
personalized-llms with allen institute
☆13Jun 22, 2023Updated 3 years ago
declare-lab / WikiDes
View on GitHub
A Wikipedia-based summarization dataset
☆14Mar 27, 2023Updated 3 years ago
VITA-Group / PrAC-LTH
View on GitHub
[ICML 2021] "Efficient Lottery Ticket Finding: Less Data is More" by Zhenyu Zhang*, Xuxi Chen*, Tianlong Chen*, Zhangyang Wang
☆26Dec 30, 2021Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
git-disl / Vaccine
View on GitHub
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆51Jan 15, 2026Updated 6 months ago
declare-lab / CICERO
View on GitHub
The purpose of this repository is to introduce new dialogue-level commonsense inference datasets and tasks. We chose dialogues as the dat…
☆64Mar 14, 2023Updated 3 years ago
GXimingLu / IPA
View on GitHub
Codebase for Inference-Time Policy Adapters
☆25Nov 3, 2023Updated 2 years ago
uw-nsl / ArtPrompt
View on GitHub
[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`
☆102Aug 15, 2025Updated 11 months ago
chujiezheng / LLM-Safeguard
View on GitHub
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆108May 20, 2025Updated last year
poloclub / llm-landscape
View on GitHub
NeurIPS'24 - LLM Safety Landscape
☆40Oct 21, 2025Updated 8 months ago
declare-lab / MSA-Robustness
View on GitHub
NAACL 2022 paper on Analyzing Modality Robustness in Multimodal Sentiment Analysis
☆31Jan 21, 2023Updated 3 years ago