wangyu-ustc / LargeScaleWashingLinks
The official implementation of the paper "Large Scale Knowledge Washing"
☆10Updated last year
Alternatives and similar repositories for LargeScaleWashing
Users that are interested in LargeScaleWashing are comparing it to the libraries listed below
Sorting:
- Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".☆27Updated last year
- Some code for "Stealing Part of a Production Language Model"☆22Updated last year
- ☆41Updated last year
- Augmenting Statistical Models with Natural Language Parameters☆29Updated last year
- ☆46Updated last year
- A symbolic benchmark for verifiable chain-of-thought financial reasoning. Includes executable templates, 58 topics across 12 domains, and…☆20Updated 3 weeks ago
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆13Updated last year
- ☆128Updated 2 weeks ago
- AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback☆16Updated last month
- The official implementation of the paper **LVChat: Facilitating Long Video Comprehension**☆14Updated last year
- ☆20Updated 2 weeks ago
- Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…☆84Updated last year
- Align your LM to express calibrated verbal statements of confidence in its long-form generations.☆27Updated last year
- [ACL2025 Best Paper] Language Models Resist Alignment☆36Updated 5 months ago
- ☆23Updated last year
- [EMNLP 2024] RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning☆14Updated 6 months ago
- ☆22Updated last year
- ☆51Updated last year
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆86Updated last year
- ☆29Updated 8 months ago
- Toolkit for evaluating the trustworthiness of generative foundation models.☆123Updated 2 months ago
- Code for the paper "Spectral Editing of Activations for Large Language Model Alignments"☆28Updated 10 months ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆116Updated 8 months ago
- ☆33Updated last year
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆163Updated 6 months ago
- [ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization☆31Updated 9 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆89Updated last year
- ☆18Updated 2 months ago
- [EMNLP 2025 Main] ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆38Updated 2 months ago
- [ACL 25] SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities☆25Updated 7 months ago