allenai / real-toxicity-prompts
☆190Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for real-toxicity-prompts
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆67Updated 3 years ago
- This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.☆280Updated 5 months ago
- Token-level Reference-free Hallucination Detection☆93Updated last year
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆79Updated 8 months ago
- Repository for the Bias Benchmark for QA dataset.☆87Updated 10 months ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆155Updated 6 months ago
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maske…☆103Updated 8 months ago
- ☆68Updated 9 months ago
- Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.☆160Updated last year
- Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.☆177Updated 2 years ago
- A framework for few-shot evaluation of autoregressive language models.☆101Updated last year
- ☆167Updated last year
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆54Updated 10 months ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆111Updated 8 months ago
- ☆116Updated 2 years ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆292Updated 6 months ago
- Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"☆161Updated 3 years ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆78Updated 3 months ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 7 months ago
- This project studies the performance and robustness of language models and task-adaptation methods.☆141Updated 6 months ago
- ☆111Updated last year
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆240Updated last year
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback☆201Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆108Updated last year
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆104Updated 5 months ago
- Codebase, data and models for the SummaC paper in TACL☆85Updated 11 months ago
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets☆211Updated 10 months ago
- Source code for the paper "Active Prompting with Chain-of-Thought for Large Language Models"☆219Updated 6 months ago
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆213Updated last year
- This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…☆208Updated last year