SALT-NLP / CultureBankLinks
☆47Updated 4 months ago
Alternatives and similar repositories for CultureBank
Users that are interested in CultureBank are comparing it to the libraries listed below
Sorting:
- ☆19Updated 11 months ago
- Multilingual Large Language Models Evaluation Benchmark☆133Updated last year
- First explanation metric (diagnostic report) for text generation evaluation☆62Updated 11 months ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆63Updated 2 years ago
- Code and data for the FACTOR paper☆54Updated 2 years ago
- ☆91Updated last year
- ☆187Updated 7 months ago
- This code accompanies the paper DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering.☆16Updated 2 years ago
- BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages☆45Updated 6 months ago
- A curated list of research papers and resources on Cultural LLM.☆53Updated last year
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆50Updated 2 years ago
- UnQovering Stereotyping Biases via Underspecified Questions - EMNLP 2020 (Findings)☆21Updated 4 years ago
- Token-level Reference-free Hallucination Detection☆98Updated 2 years ago
- ☆33Updated last month
- ☆22Updated 2 years ago
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation☆214Updated 2 years ago
- templates and other documents regarding responsible NLP research☆70Updated 2 years ago
- 🌲 Code for our EMNLP 2023 paper - 🎄 "Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Mode…☆54Updated 2 years ago
- Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation (EMNLP 2023)☆31Updated 3 months ago
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆119Updated last year
- Repository for the Bias Benchmark for QA dataset.☆137Updated 2 years ago
- ACL 2023: Evaluating Open-Domain Question Answering in the Era of Large Language Models☆47Updated 2 years ago
- ☆15Updated 3 years ago
- Codebase, data and models for the SummaC paper in TACL☆108Updated last year
- Source Code of Paper "GPTScore: Evaluate as You Desire"☆258Updated 2 years ago
- ☆78Updated last year
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆81Updated last year
- RARR: Researching and Revising What Language Models Say, Using Language Models☆51Updated 2 years ago
- WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000…☆48Updated 2 years ago
- The official repo for SocKET: Social Knowledge Evaluation Tests☆24Updated 9 months ago