liamdugan / raid
RAID is the largest and most challenging benchmark for machine-generated text detectors. (ACL 2024)
☆27Updated this week
Related projects: ⓘ
- Repository for the Bias Benchmark for QA dataset.☆83Updated 8 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆61Updated 4 months ago
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆63Updated 3 years ago
- Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge☆10Updated 7 months ago
- M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection☆15Updated 5 months ago
- ☆101Updated last year
- ☆23Updated last year
- Weak-to-Strong Jailbreaking on Large Language Models☆62Updated 6 months ago
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆55Updated 8 months ago
- Code for the AAAI 2023 Paper "Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Gene…☆16Updated last year
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆82Updated 2 months ago
- Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)☆27Updated last month
- ☆35Updated last year
- Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense…☆131Updated 10 months ago
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆59Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆114Updated 11 months ago
- ☆160Updated last year
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆64Updated 2 weeks ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆73Updated last year
- ☆92Updated 4 months ago
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆96Updated last week
- A resource repository for representation engineering in large language models☆36Updated last week
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models"☆54Updated 8 months ago
- ☆44Updated 2 weeks ago
- ☆11Updated 10 months ago
- ☆10Updated 10 months ago
- ☆22Updated 6 months ago
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆56Updated 6 months ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆67Updated last week
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models☆50Updated 2 months ago