GAIR-NLP / SimulateBench

GPT as Human

☆18

Alternatives and similar repositories for SimulateBench:

Users that are interested in SimulateBench are comparing it to the libraries listed below

qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆55Updated 6 months ago
GAIR-NLP / BeHonest
BeHonest: Benchmarking Honesty in Large Language Models
☆31Updated 5 months ago
wangjs9 / Muffin
Codes for Mitigating Unhelpfulness in Emotional Support Conversations with Multifaceted AI Feedback (ACL 2024 Findings)
☆13Updated 6 months ago
CriticBench / CriticBench
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆21Updated 10 months ago
iwangjian / Color4Dial
Code and data for "Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive Dialogue" (ACL Findings 2023).
☆22Updated last year
GAIR-NLP / alignment-for-honesty
☆72Updated 8 months ago
iwangjian / TopDial
Code and data for "Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation" (EMNLP 2023…
☆30Updated 9 months ago
edenbiran / RippleEdits
Evaluating the Ripple Effects of Knowledge Editing in Language Models
☆53Updated 9 months ago
HillZhang1999 / ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"
☆62Updated 11 months ago
Zce1112zslx / IKE
☆40Updated last year
Re-Align / AlignTDS
Analyzing LLM Alignment via Token distribution shift
☆15Updated last year
wzhouad / context-faithful-llm
Code and data for paper "Context-faithful Prompting for Large Language Models".
☆39Updated last year
ChengpengLi1003 / DotaMath
☆26Updated last month
hkust-nlp / felm
Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
☆57Updated last year
RUCAIBox / HaluEval-2.0
☆38Updated last year
siyuyuan / coscript
Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planning
☆36Updated last year
qinyiwei / InfoBench
☆52Updated 5 months ago
GAIR-NLP / MoPS
[ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"
☆34Updated 6 months ago
yizhongw / llm-temporal-alignment
Methods and evaluation for aligning language models temporally
☆27Updated 10 months ago
OpenMOSS / Say-I-Dont-Know
[ICML'2024] Can AI Assistants Know What They Don't Know?
☆77Updated 11 months ago
krystalan / chatgpt_as_nlg_evaluator
Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study
☆43Updated last year
cooperleong00 / ToxificationReversal
Code for the paper "Self-Detoxifying Language Models via Toxification Reversal" (EMNLP 2023)
☆15Updated last year
littlehacker26 / Discriminator-Cooperative-Unlikelihood-Prompt-Tuning
The code implementation of the EMNLP2022 paper: DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Gene…
☆25Updated last year
dqxiu / PLMs-with-Knowledge
☆17Updated 2 years ago
AI21Labs / factor
Code and data for the FACTOR paper
☆44Updated last year
SumilerGAO / SunGen
☆25Updated last year
chujiezheng / LLM-Extrapolation
Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"
☆71Updated 7 months ago
KwanWaiChung / M4LE
Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
☆22Updated 6 months ago
OpenLMLab / LongWanjuan
Towards Systematic Measurement for Long Text Quality
☆31Updated 4 months ago
zhu-minjun / PAlign
Personality Alignment of Language Models
☆20Updated 4 months ago