chrisvdweth / seleneLinks
An open, large-scale, interactive textbook.
☆48Updated last week
Alternatives and similar repositories for selene
Users that are interested in selene are comparing it to the libraries listed below
Sorting:
- Papers on fairness in NLP☆450Updated last year
- ☆155Updated 2 years ago
- ☆54Updated last year
- Materials for EACL2024 tutorial: Transformer-specific Interpretability☆60Updated last year
- A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP☆951Updated last year
- A reading list for papers on causality for natural language processing (NLP)☆673Updated 5 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆83Updated 8 months ago
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maske…☆125Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆150Updated 2 months ago
- A resource repository for representation engineering in large language models☆140Updated 11 months ago
- This repository collects all relevant resources about interpretability in LLMs☆377Updated last year
- ☆237Updated last year
- A reading list of up-to-date papers on NLP for Social Good.☆304Updated 2 years ago
- [NeurIPS D&B '25] The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning metho…☆405Updated last month
- A resource repository for machine unlearning in large language models☆503Updated 3 months ago
- [EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156☆40Updated last year
- A survey and reflection on the latest research breakthroughs in LLM-generated Text detection, including data, detectors, metrics, current…☆235Updated 10 months ago
- Resources for cultural NLP research☆106Updated last month
- The lastest paper about detection of LLM-generated text and code☆280Updated 4 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆98Updated 2 years ago
- Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense…☆179Updated 2 years ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆76Updated last year
- Python package for measuring memorization in LLMs.☆173Updated 3 months ago
- Aligning AI With Shared Human Values (ICLR 2021)☆302Updated 2 years ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆116Updated 8 months ago
- ☆208Updated 11 months ago
- ☆116Updated last year
- Repository for research in the field of Responsible NLP at Meta.☆202Updated 5 months ago
- A data construction and evaluation framework to quantify privacy norm awareness of language models (LMs) and emerging privacy risk of LM …☆32Updated 8 months ago
- ☆28Updated last year