petergpt / bullshit-benchmarkView on GitHub
BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.
1,161Mar 18, 2026Updated this week

Alternatives and similar repositories for bullshit-benchmark

Users that are interested in bullshit-benchmark are comparing it to the libraries listed below

Sorting:

Are these results useful?