audreycs / MentalManipLinks
Dataset and code for the paper MentalManip: A Dataset For Fine-grained Analysis of Mental Manipulation in Conversations (ACL'24).
☆23Updated 7 months ago
Alternatives and similar repositories for MentalManip
Users that are interested in MentalManip are comparing it to the libraries listed below
Sorting:
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆48Updated last year
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆150Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆68Updated last year
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆126Updated last year
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆86Updated last year
- Codebase for LLM Textual Hallucination Benchmark☆66Updated 8 months ago
- ☆89Updated last year
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆26Updated last year
- awesome SAE papers☆69Updated 7 months ago
- FeatureAlignment = Alignment + Mechanistic Interpretability☆33Updated 9 months ago
- LLM Unlearning☆178Updated 2 years ago
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"☆92Updated last year
- ☆77Updated last year
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models☆115Updated 5 months ago
- This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language …☆157Updated 7 months ago
- [ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models☆64Updated 9 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆126Updated last year
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆118Updated last year
- A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…☆140Updated 4 months ago
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)☆133Updated 5 months ago
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆71Updated 3 years ago
- ☆48Updated last year
- ☆43Updated last year
- [NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"☆142Updated 2 months ago
- ☆24Updated last year
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆50Updated 2 years ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆288Updated this week
- ☆182Updated last year
- Source code of our paper MIND, ACL 2024 Long Paper☆59Updated last month
- A resource repository for representation engineering in large language models☆143Updated last year