Li-Hyn / LLM_CatastrophicForgettingLinks

Code for LLM_Catastrophic_Forgetting via SAM.

☆11

Alternatives and similar repositories for LLM_CatastrophicForgetting

Users that are interested in LLM_CatastrophicForgetting are comparing it to the libraries listed below

Sorting:

avalonstrel / Mitigating-the-Alignment-Tax-of-RLHF
☆15Updated last year
HillZhang1999 / ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"
☆69Updated last year
declare-lab / resta
Restore safety in fine-tuned language models through task arithmetic
☆29Updated last year
JasonForJoy / Model-Editing-Hurt
EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
☆37Updated 4 months ago
princeton-nlp / benign-data-breaks-safety
☆41Updated last year
LiuAmber / RAHF
[ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…
☆27Updated last year
BeyonderXX / TRACE
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models
☆79Updated last year
Yangyi-Chen / PaperList-Trustworthy-Applications
Mostly recording papers about models' trustworthy applications. Intending to include topics like model evaluation & analysis, security, c…
☆21Updated 2 years ago
hkust-nlp / PEM_composition
[NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"
☆61Updated last year
ShuheSH / A-Survey-of-the-Reasoning-Abilities-of-LLMs
☆25Updated 7 months ago
junkangwu / beta-DPO
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
☆49Updated 11 months ago
yizhongw / llm-temporal-alignment
Methods and evaluation for aligning language models temporally
☆30Updated last year
cxcscmu / MATES
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
☆74Updated 11 months ago
circle-hit / SAPT
Code for ACL 2024 accepted paper titled "SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language …
☆36Updated 9 months ago
zijian678 / TDD
☆11Updated last year
snw2021 / LLM_Unlearning_Papers
☆26Updated last year
liyucheng09 / Contamination_Detector
Lightweight tool to identify Data Contamination in LLMs evaluation
☆52Updated last year
decoding-comp-trust / comp-trust
Codebase for decoding compressed trust.
☆24Updated last year
xiatingyu / SFT-DataSelection-at-scale
☆30Updated 8 months ago
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆70Updated 2 years ago
tml-epfl / long-is-more-for-alignment
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]
☆19Updated last year
XMUDeepLIT / SSR
Code for "Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal" (ACL 2024)
☆15Updated 11 months ago
Princeton-SysML / kNNLM_privacy
Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888
☆36Updated last year
DYR1 / MoGU
Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.
☆18Updated 9 months ago
ZHZisZZ / weak-to-strong-search
[NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
☆62Updated 10 months ago
nayeon7lee / FactualityPrompt
☆86Updated 2 years ago
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆60Updated 10 months ago
Shark-NLP / self-adaptive-ICL
self-adaptive in-context learning
☆45Updated 2 years ago
SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆99Updated last year
ADaM-BJTU / W2SG
The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”
☆17Updated last year