shacharKZ / VISIT-Visualizing-TransformersLinks
☆26Updated last year
Alternatives and similar repositories for VISIT-Visualizing-Transformers
Users that are interested in VISIT-Visualizing-Transformers are comparing it to the libraries listed below
Sorting:
- ☆64Updated 2 years ago
- ☆51Updated 2 months ago
- [ICLR 2025] General-purpose activation steering library☆108Updated 3 weeks ago
- ☆106Updated 7 months ago
- ☆14Updated 2 years ago
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models☆74Updated last year
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆56Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆147Updated last month
- Sparse probing paper full code.☆61Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆116Updated 2 years ago
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆97Updated 4 years ago
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers☆41Updated 7 months ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆82Updated last year
- ☆51Updated last year
- ☆97Updated last year
- "Understanding Dataset Difficulty with V-Usable Information" (ICML 2022, outstanding paper)☆87Updated 2 years ago
- ☆231Updated last year
- Code and data for Marked Personas (ACL 2023)☆28Updated 2 years ago
- [ICLR 2022] Towards Continual Knowledge Learning of Language Models☆92Updated 2 years ago
- Repository for research in the field of Responsible NLP at Meta.☆202Updated 4 months ago
- ☆18Updated last month
- Steering Llama 2 with Contrastive Activation Addition☆187Updated last year
- ☆115Updated last year
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆80Updated 7 months ago
- Source code and data for ADEPT: A DEbiasing PrompT Framework (AAAI-23).☆15Updated 9 months ago
- MEND: Fast Model Editing at Scale☆250Updated 2 years ago
- ☆78Updated 2 years ago
- This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".☆88Updated 4 years ago
- ☆56Updated 2 years ago
- Official Code for the papers: "Controlled Text Generation as Continuous Optimization with Multiple Constraints" and "Gradient-based Const…☆63Updated last year