for-ai / goodtriever
Code for "Goodtriever: Toxicity Mitigation with Retrieval-augmented Language Models"
☆22Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for goodtriever
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.☆20Updated 2 years ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆54Updated 11 months ago
- Restore safety in fine-tuned language models through task arithmetic☆26Updated 7 months ago
- Methods and evaluation for aligning language models temporally☆24Updated 8 months ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆39Updated last year
- EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975☆37Updated 11 months ago
- ☆40Updated 11 months ago
- Benchmarking Generalization to New Tasks from Natural Language Instructions☆25Updated 3 years ago
- Code repository for the paper "Mission: Impossible Language Models."☆39Updated 10 months ago
- ☆42Updated 10 months ago
- Evaluate the Quality of Critique☆35Updated 5 months ago
- First explanation metric (diagnostic report) for text generation evaluation☆61Updated 4 months ago
- Token-level Reference-free Hallucination Detection☆93Updated last year
- ☆44Updated 2 months ago
- This code accompanies the paper DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering.☆18Updated last year
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆43Updated 3 months ago
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆57Updated last month
- ☆39Updated last year
- A Survey of Hallucination in Large Foundation Models☆50Updated 10 months ago
- ☆39Updated 7 months ago
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models☆22Updated 3 months ago
- Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"☆64Updated 2 years ago
- TBC☆26Updated 2 years ago
- Code and data for the FACTOR paper☆39Updated last year
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆18Updated last year
- Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".☆71Updated last week
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆42Updated last year
- AbstainQA, ACL 2024☆20Updated last month
- ☆58Updated 2 years ago
- The data and the PyTorch implementation for the models and experiments in the paper "Exploiting Asymmetry for Synthetic Training Data Gen…☆58Updated last year