petergpt / bullshit-benchmarkView on GitHub
BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.
1,401Apr 8, 2026Updated this week

Alternatives and similar repositories for bullshit-benchmark

Users that are interested in bullshit-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?