xqlin98 / INSTINCT

This is the official implementation for the paper: Use Your INSTINCT: INSTruction optimization usIng Neural bandits Coupled with Transformers

☆34

Related projects: ⓘ

ZO-Bench / ZO-LLM
[ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".
☆62Updated 2 months ago
abhishekpanigrahi1996 / Skill-Localization-by-grafting
☆38Updated 8 months ago
boyiwei / alignment-attribution-code
Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆55Updated 2 months ago
gortizji / tangent_task_arithmetic
Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".
☆79Updated last year
locuslab / tofu
Landing Page for TOFU
☆79Updated 3 months ago
Kaffaljidhmah2 / Arxiv-Recommender
☆40Updated 10 months ago
YefanZhou / TempBalance
[NeurIPS 2023 Spotlight] Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training
☆24Updated 9 months ago
Improbable-AI / curiosity_redteam
Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…
☆57Updated 6 months ago
shiqiangw / iclr2024-scores
☆52Updated 8 months ago
git-disl / Vaccine
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models"
☆12Updated last week
EnnengYang / AdaMerging
AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.
☆40Updated 2 weeks ago
ZaydH / influence_analysis_papers
Influence Analysis and Estimation - Survey, Papers, and Taxonomy
☆58Updated 6 months ago
ykwon0407 / DataInf
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)
☆48Updated 5 months ago
zhxieml / remiss-jailbreak
☆20Updated 2 months ago
deeplearning-wisc / args
☆30Updated 7 months ago
vfleaking / PTST
Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"
☆16Updated 6 months ago
centerforaisafety / tdc2023-starter-kit
This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.
☆77Updated 4 months ago
nik-dim / tall_masks
Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]
☆27Updated 4 months ago
TRAIS-Lab / dattri
`dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.
☆27Updated this week
tmllab / 2023_ICLR_Moderate-DS
☆24Updated last year
EfficientTraining / LabelBench
☆38Updated 6 months ago
mmatena / model_merging
☆61Updated 2 years ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆78Updated last week
tanganke / fusion_bench
FusionBench: A Comprehensive Benchmark of Deep Model Fusion
☆42Updated 2 weeks ago
jinhaoduan / SAR
[ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
☆22Updated 2 weeks ago
YangRui2015 / RiC
Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"
☆38Updated last month
SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆79Updated 3 months ago
MiaoXiong2320 / ProximityBias-Calibration
☆16Updated 10 months ago
Unispac / shallow-vs-deep-alignment
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆22Updated 2 months ago
ys-zong / VLGuard
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
☆36Updated last month