YFHuangxxxx / CBBQView external linksLinks
☆29Aug 9, 2023Updated 2 years ago
Alternatives and similar repositories for CBBQ
Users that are interested in CBBQ are comparing it to the libraries listed below
Sorting:
- ☆21Mar 17, 2025Updated 11 months ago
- ☆11Oct 12, 2023Updated 2 years ago
- A new release of Chinese sexism dataset and lexicon☆13May 23, 2023Updated 2 years ago
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.☆63May 21, 2024Updated last year
- A Chinese corpus for gender bIas probing and mitigation, which contains 32.9k sentences with high-quality labels.☆20Aug 15, 2024Updated last year
- PRODIGy is a collection of dialogues in which each conversation is aligned with speaker profile representations.☆19Jan 8, 2025Updated last year
- Code for "Goodtriever: Toxicity Mitigation with Retrieval-augmented Language Models"☆25May 30, 2024Updated last year
- A Bilingual Role Evaluation Benchmark for Large Language Models☆43Jan 9, 2024Updated 2 years ago
- Code for EMNLP 2023 findings paper "A Closer Look into Using Large Language Models for Automatic Evaluation"☆19Oct 9, 2023Updated 2 years ago
- Code and data for Marked Personas (ACL 2023)☆28May 26, 2023Updated 2 years ago
- ☆28Sep 21, 2024Updated last year
- ☆30Feb 16, 2024Updated 2 years ago
- 面向中文大模型价值观的评估与对齐研究☆553Jul 20, 2023Updated 2 years ago
- An R package implementing computational models of Eriksen flanker task performance.☆10Sep 19, 2025Updated 4 months ago
- A collection of practical code generation tasks and tests in open source projects. Complementary to HumanEval by OpenAI.☆154Dec 25, 2024Updated last year
- The current repository is able to assess the relationship between EEG components and HDDM parameters of top-down attention in perceptual …☆15Nov 27, 2024Updated last year
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated 11 months ago
- ☆10Sep 17, 2022Updated 3 years ago
- 用Paddle复现论文ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information(ACL2021)☆10Nov 15, 2021Updated 4 years ago
- Fortifying Toxic Speech Detectors Against Veiled Toxicity☆11Oct 21, 2020Updated 5 years ago
- Urai AE, de Gee JW, Tsetsos K, Donner TH (2019) Choice history biases subsequent evidence accumulation. eLife☆15Jul 20, 2020Updated 5 years ago
- ☆12Jan 11, 2026Updated last month
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- R package on Nigeria and for Nigeria☆12Sep 1, 2025Updated 5 months ago
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- Debug DeepSpeed-Chat step by step in IDE (在IDE里一步一步调试DeepSpeed-Chat)☆10Apr 17, 2023Updated 2 years ago
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- ☆44Jun 29, 2023Updated 2 years ago
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆272Jul 28, 2025Updated 6 months ago
- 🎭 Official code and dataset for our CCGPK@COLING 2022 paper - "PersonaChatGen: Generating Personalized Dialogue using GPT-3"☆13Mar 26, 2024Updated last year
- ☆20Sep 30, 2025Updated 4 months ago
- ☆10Apr 5, 2022Updated 3 years ago
- Replication code for "The Structure of Toxic Conversations on Twitter" (WWW'21)☆10May 25, 2021Updated 4 years ago
- The official Python library for Openlayer, the Continuous Model Improvement Platform for AI. 📈☆16Updated this week
- distilled Self-Critique refines the outputs of a LLM with only synthetic data☆11Apr 11, 2024Updated last year
- ☆12Mar 5, 2025Updated 11 months ago
- 用Paddle复现Recipes for building an open-domain chatbot论文☆11Nov 1, 2021Updated 4 years ago
- Code and Data for GlitchBench☆13Feb 27, 2024Updated last year
- Code for our project CROWN (Conversational Passage Ranking by Reasoning over Word Networks)☆10Jan 11, 2024Updated 2 years ago