wangyu-ustc / LargeScaleWashingLinks
The official implementation of the paper "Large Scale Knowledge Washing"
☆10Updated last year
Alternatives and similar repositories for LargeScaleWashing
Users that are interested in LargeScaleWashing are comparing it to the libraries listed below
Sorting:
- Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".☆29Updated last year
- The official implementation of the paper **LVChat: Facilitating Long Video Comprehension**☆14Updated last year
- AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback☆17Updated 3 months ago
- Code for Representation Bending Paper☆14Updated 6 months ago
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆89Updated last year
- ☆46Updated last year
- Augmenting Statistical Models with Natural Language Parameters☆29Updated last year
- ☆51Updated 2 years ago
- Code for the paper "Spectral Editing of Activations for Large Language Model Alignments"☆29Updated last year
- Code for ACL 2023 paper "BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases".☆22Updated 2 years ago
- [EMNLP 2024] RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning☆15Updated 8 months ago
- [ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models☆60Updated last year
- [ACL2025 Best Paper] Language Models Resist Alignment☆41Updated 7 months ago
- ☆22Updated 3 months ago
- ☆103Updated last year
- ☆44Updated last year
- Some code for "Stealing Part of a Production Language Model"☆22Updated last year
- ☆29Updated last year
- A symbolic benchmark for verifiable chain-of-thought financial reasoning. Includes executable templates, 58 topics across 12 domains, and…☆25Updated last month
- ☆72Updated last year
- [ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…☆28Updated last year
- [ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative foundation models.☆125Updated 5 months ago
- This is the official implementation of ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting☆24Updated last year
- [ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization☆32Updated last month
- Align your LM to express calibrated verbal statements of confidence in its long-form generations.☆29Updated last year
- [COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆51Updated 10 months ago
- ☆37Updated 2 years ago
- ☆32Updated 10 months ago
- Can Knowledge Editing Really Correct Hallucinations? (ICLR 2025)☆27Updated 5 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆96Updated last year