zjunlp/LookAheadTuning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zjunlp/LookAheadTuning)

zjunlp / LookAheadTuning

[WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews

☆17

Alternatives and similar repositories for LookAheadTuning

Users that are interested in LookAheadTuning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LLLeoLi / LARF
View on GitHub
[EMNLP 2025] Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
☆15Jul 22, 2025Updated last year
aladinD / SafeMERGE
View on GitHub
Code for SafeMERGE (ICLR 2025).
☆15Apr 1, 2025Updated last year
init0xyz / AdaCQR
View on GitHub
Implementation of AdaCQR(COLING 2025)
☆15Dec 30, 2024Updated last year
avalonstrel / Mitigating-the-Alignment-Tax-of-RLHF
View on GitHub
☆16Feb 8, 2024Updated 2 years ago
zjunlp / KnowUnDo
View on GitHub
[EMNLP 2024] To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
☆48Jan 23, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
homles11 / SaLoRA
View on GitHub
Code for “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation(ICLR 2025)”
☆29Oct 23, 2025Updated 8 months ago
OPPO-PersonalAI / PersonalizedDeepResearchBench
View on GitHub
☆24Jan 27, 2026Updated 5 months ago
SophieZheng998 / ALI-Agent
View on GitHub
Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"
☆21Jan 31, 2026Updated 5 months ago
zjunlp / CaKE
View on GitHub
[EMNLP 2025] Circuit-Aware Editing Enables Generalizable Knowledge Learners
☆19Nov 17, 2025Updated 8 months ago
prafulla77 / TAC-KBP-2017-Participation
View on GitHub
☆12Jun 7, 2019Updated 7 years ago
git-disl / Vaccine
View on GitHub
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆51Jan 15, 2026Updated 6 months ago
OceanGPT / OceanGym
View on GitHub
OceanGym: A Benchmark Environment for Underwater Embodied Agents
☆133Jul 3, 2026Updated 2 weeks ago
zjunlp / predict-before-execute
View on GitHub
Can We Predict Before Executing Machine Learning Agents?
☆19Jul 7, 2026Updated 2 weeks ago
Jayfeather1024 / Backdoor-Enhanced-Alignment
View on GitHub
☆24Dec 8, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
zjunlp / LightThinker
View on GitHub
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
☆165Jun 22, 2026Updated last month
MAS-KE / ICDM_2020_KGC
View on GitHub
Consumer Event Cause Extraction Baseline Model
☆16Aug 3, 2020Updated 5 years ago
ethz-spylab / jailbreak-tax
View on GitHub
☆24Feb 17, 2026Updated 5 months ago
zjunlp / LREBench
View on GitHub
[EMNLP 2022 Findings] Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study
☆34Feb 23, 2024Updated 2 years ago
zhiyuanhubj / AAAI-19_slide_poster
View on GitHub
☆21Jan 15, 2019Updated 7 years ago
BAI-LAB / BaiJia
View on GitHub
[WWW 2026] BaiJia: An Open Role-Playing Platform of Chinese Historical Characters
☆28Jan 14, 2026Updated 6 months ago
irecsys / Tutorial_MSRS
View on GitHub
Tutorial for Multi-Stakeholder Recommender Systems
☆22Aug 23, 2021Updated 4 years ago
CERT-Lab / fed-sb
View on GitHub
(TMLR J2C Certification) Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tu…
☆27Oct 4, 2025Updated 9 months ago
letsgoLakers / NCIFD
View on GitHub
面向大模型的民族文化数据集
☆13May 26, 2025Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
Alan-Qin / Transfer_attack_RAP
View on GitHub
Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation (NeurIPS 2022)
☆33Dec 16, 2022Updated 3 years ago
princeton-nlp / benign-data-breaks-safety
View on GitHub
☆47Oct 1, 2024Updated last year
choidami / inductive-oocr
View on GitHub
☆16Mar 22, 2025Updated last year
zhu-minjun / SafetyLock
View on GitHub
Your finetuned model's back to its original safety standards faster than you can say "SafetyLock"!
☆11Oct 16, 2024Updated last year
princeton-polaris-lab / Evaluating-Durable-Safeguards
View on GitHub
[ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs
☆13Jun 20, 2025Updated last year
alipay / RJU_Ant_QA
View on GitHub
The RJUA-QA (RenJi hospital department of Urology and Antgroup collaborative Question and Answer dataset) is an innovative medical urolog…
☆60Apr 22, 2024Updated 2 years ago
rmin2000 / adv_tracing
View on GitHub
Identification of the Adversary from a Single Adversarial Example (ICML 2023)
☆10Jul 15, 2024Updated 2 years ago
SCIR-SC-Qiaoban-Team / FreeEvalLM
View on GitHub
[AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…
☆11Feb 7, 2026Updated 5 months ago
NLP-Tutorials / AACL-IJCNLP2022-KGC-Tutorial
View on GitHub
Materials for AACL-IJCNLP-2022 tutorial: Efficient and Robust Knowledge Graph Construction
☆28Feb 3, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
git-disl / Virus
View on GitHub
This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"
☆56Feb 2, 2025Updated last year
IBM / NeuralFuse
View on GitHub
[NeurIPS'24] "NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes" by Hao-Lun …
☆10Sep 18, 2025Updated 10 months ago
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
lapisrocks / DiscreteAdversarialDistillation
View on GitHub
[NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"
☆11Jun 18, 2024Updated 2 years ago
sail-sg / SimLayerKV
View on GitHub
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆54Oct 18, 2024Updated last year
2003pro / TAGCOS
View on GitHub
This is the official implementation of TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data
☆13Jul 21, 2024Updated 2 years ago
xypan0 / G-DIG
View on GitHub
☆12Jun 30, 2024Updated 2 years ago