liamdugan / raid
RAID is the largest and most challenging benchmark for machine-generated text detectors. (ACL 2024)
☆36Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for raid
- Repository for the Bias Benchmark for QA dataset.☆87Updated 10 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆71Updated 6 months ago
- ☆111Updated last year
- Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense…☆138Updated last year
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆58Updated 8 months ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆54Updated 10 months ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆292Updated 6 months ago
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆64Updated 10 months ago
- Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)☆38Updated last month
- ☆39Updated last year
- ☆48Updated this week
- Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge☆11Updated 9 months ago
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆73Updated 2 months ago
- The Paper List on Data Contamination for Large Language Models Evaluation.☆76Updated this week
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆43Updated 3 months ago
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Models☆15Updated 4 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆97Updated 7 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆83Updated 4 months ago
- ☆48Updated 7 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆78Updated last year
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.☆48Updated this week
- ☆94Updated 6 months ago
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆79Updated 8 months ago
- Improving Alignment and Robustness with Circuit Breakers☆154Updated last month
- ☆22Updated 8 months ago
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆126Updated last year
- A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper…☆107Updated last month
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆61Updated 7 months ago
- Official Code for EMNLP2023 Main Conference paper: "KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detec…☆29Updated last year
- A Survey on Data Selection for Language Models☆182Updated last month