k-hanawa / criteria_for_instance_based_explanation
☆9Updated last year
Related projects ⓘ
Alternatives and complementary repositories for criteria_for_instance_based_explanation
- ☆25Updated 4 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆28Updated 4 months ago
- ☆24Updated last year
- ☆11Updated 2 years ago
- "Understanding Dataset Difficulty with V-Usable Information" (ICML 2022, outstanding paper)☆82Updated last year
- Official Repository for ICML 2023 paper "Can Neural Network Memorization Be Localized?"☆16Updated last year
- ☆26Updated 9 months ago
- ☆15Updated 9 months ago
- Code for Environment Inference for Invariant Learning (ICML 2021 Paper)☆49Updated 3 years ago
- Group-conditional DRO to alleviate spurious correlations☆15Updated 3 years ago
- This is a PyTorch reimplementation of Influence Functions from the ICML2017 best paper: Understanding Black-box Predictions via Influence…☆16Updated 4 years ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆54Updated 2 weeks ago
- Model zoo for different kinds of uncertainty quantification methods used in Natural Language Processing, implemented in PyTorch.☆47Updated last year
- Code for preprint: Summarizing Differences between Text Distributions with Natural Language☆42Updated last year
- [ICLR 2021] "InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective" by Boxin Wang, Shuohang Wang, Y…☆83Updated last year
- ☆28Updated 3 years ago
- ☆25Updated 3 years ago
- ☆26Updated 3 weeks ago
- Implementation for Variational Information Bottleneck for Effective Low-resource Fine-tuning, ICLR 2021☆38Updated 3 years ago
- A modern look at the relationship between sharpness and generalization [ICML 2023]☆43Updated last year
- Code for "Universal Adversarial Triggers Are Not Universal."☆16Updated 6 months ago
- ☆26Updated 6 months ago
- Influence Analysis and Estimation - Survey, Papers, and Taxonomy☆63Updated 8 months ago
- ☆86Updated last year
- A framework for assessing and improving classification fairness.☆33Updated last year
- ☆15Updated 4 years ago
- ☆17Updated 10 months ago
- ☆19Updated last month
- ☆16Updated 4 months ago
- ☆21Updated last month