☆29Aug 9, 2023Updated 2 years ago
Alternatives and similar repositories for CBBQ
Users that are interested in CBBQ are comparing it to the libraries listed below
Sorting:
- ☆20Mar 17, 2025Updated 11 months ago
- A new release of Chinese sexism dataset and lexicon☆14May 23, 2023Updated 2 years ago
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.☆63May 21, 2024Updated last year
- Code for "Goodtriever: Toxicity Mitigation with Retrieval-augmented Language Models"☆25May 30, 2024Updated last year
- A Chinese corpus for gender bIas probing and mitigation, which contains 32.9k sentences with high-quality labels.☆20Aug 15, 2024Updated last year
- Code for EMNLP 2023 findings paper "A Closer Look into Using Large Language Models for Automatic Evaluation"☆19Oct 9, 2023Updated 2 years ago
- Repository for the Bias Benchmark for QA dataset.☆139Jan 8, 2024Updated 2 years ago
- ☆30Feb 16, 2024Updated 2 years ago
- 面向中文大模型价值观的评估与对齐研究☆554Jul 20, 2023Updated 2 years ago
- An R package implementing computational models of Eriksen flanker task performance.☆10Sep 19, 2025Updated 5 months ago
- Urai AE, de Gee JW, Tsetsos K, Donner TH (2019) Choice history biases subsequent evidence accumulation. eLife☆15Jul 20, 2020Updated 5 years ago
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated last year
- ☆12Jan 11, 2026Updated last month
- It is very difficult for getting a perfect distance between gaps and objects, Here using OpenCV, some possibilities can be made☆10Nov 24, 2018Updated 7 years ago
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- 用Paddle复现论文ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information(ACL2021)☆10Nov 15, 2021Updated 4 years ago
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- Fortifying Toxic Speech Detectors Against Veiled Toxicity☆11Oct 21, 2020Updated 5 years ago
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆273Jul 28, 2025Updated 7 months ago
- ☆44Jun 29, 2023Updated 2 years ago
- benchmarks for evaluating MT models☆11Jun 26, 2024Updated last year
- ☆20Sep 30, 2025Updated 5 months ago
- distilled Self-Critique refines the outputs of a LLM with only synthetic data☆11Apr 11, 2024Updated last year
- Align, a general text alignment function☆15Dec 7, 2023Updated 2 years ago
- LLM benchmarks☆13Feb 22, 2024Updated 2 years ago
- ☆12Nov 5, 2024Updated last year
- Code for our project CROWN (Conversational Passage Ranking by Reasoning over Word Networks)☆10Jan 11, 2024Updated 2 years ago
- Replication code for "The Structure of Toxic Conversations on Twitter" (WWW'21)☆10May 25, 2021Updated 4 years ago
- ☆11Apr 28, 2024Updated last year
- ☆11Nov 5, 2024Updated last year
- ☆11Jan 3, 2024Updated 2 years ago
- Survey of available speech datasets for Polish ASR development☆17Jan 1, 2025Updated last year
- Website for release of TellMeWhy dataset for why question answering☆14Nov 11, 2022Updated 3 years ago
- Supplementary material for "A practical guide for transparency in psychological science" (find the paper at https://psyarxiv.com/rtygm/)☆10Nov 24, 2021Updated 4 years ago
- ☆10Apr 5, 2022Updated 3 years ago
- Tools and examples for fitting (Hierarchical) Drift Diffusion Models in R☆11Jul 11, 2023Updated 2 years ago
- 🎭 Official code and dataset for our CCGPK@COLING 2022 paper - "PersonaChatGen: Generating Personalized Dialogue using GPT-3"☆13Mar 26, 2024Updated last year
- ☆11Oct 15, 2022Updated 3 years ago