petergpt/bullshit-benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/petergpt/bullshit-benchmark)

petergpt / bullshit-benchmark

BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.

☆1,779

Alternatives and similar repositories for bullshit-benchmark

Users that are interested in bullshit-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

earendil-works / pi
View on GitHub
AI agent toolkit: unified LLM API, agent loop, TUI, coding agent CLI
☆73,494Updated this week
anomalyco / opencode
View on GitHub
The open source coding agent.
☆187,806Updated this week
karpathy / autoresearch
View on GitHub
AI agents running research on single-GPU nanochat training automatically
☆91,629Mar 26, 2026Updated 3 months ago
unslothai / unsloth
View on GitHub
Unsloth is a local UI for training and running Gemma 4, Qwen3.6, DeepSeek, Kimi, GLM and other models.
☆68,541Updated this week
datacurve-ai / deep-swe
View on GitHub
Measuring frontier coding agents on original, long-horizon engineering tasks
☆1,182Jul 10, 2026Updated last week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
p-e-w / heretic
View on GitHub
Fully automatic censorship removal for language models
☆26,529Updated this week
JuliusBrussee / caveman
View on GitHub
🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman
☆91,105Jul 3, 2026Updated 2 weeks ago
techfort / cronpulse-community
View on GitHub
a dead simple monitoring service with real time alerts
☆19Jan 29, 2026Updated 5 months ago
googleworkspace / cli
View on GitHub
Google Workspace CLI — one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Goog…
☆29,853Updated this week
BerriAI / litellm
View on GitHub
The fastest, litest AI Gateway. Rust core with Python SDK. Call 100+ LLM APIs in OpenAI (or native) format with cost tracking, guardrails…
☆54,122Updated this week
AlexsJones / llmfit
View on GitHub
Hundreds of models & providers. One command to find what runs on your hardware.
☆29,863Updated this week
promptfoo / promptfoo
View on GitHub
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, De…
☆23,439Updated this week
karpathy / nanochat
View on GitHub
The best ChatGPT that $100 can buy.
☆56,462Jul 4, 2026Updated 2 weeks ago
google / langextract
View on GitHub
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive vi…
☆37,615Jul 2, 2026Updated 2 weeks ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
open-webui / open-webui
View on GitHub
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
☆146,085Updated this week
tobi / qmd
View on GitHub
mini cli search engine for your docs, knowledge bases, meeting notes, whatever. Tracking current sota approaches while being all local
☆28,075Jun 24, 2026Updated 3 weeks ago
aaif-goose / goose
View on GitHub
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
☆51,333Updated this week
toon-format / toon
View on GitHub
🎒 Token-Oriented Object Notation (TOON) – Compact, human-readable, schema-aware JSON for LLM prompts. Spec, benchmarks, TypeScript SDK.
☆24,937Updated this week
davis7dotsh / grep-bench
View on GitHub
Benchmark for how well different models search codebases
☆16Jun 9, 2026Updated last month
ggml-org / llama.cpp
View on GitHub
LLM inference in C/C++
☆121,053Updated this week
Aider-AI / aider
View on GitHub
aider is AI pair programming in your terminal
☆47,541May 22, 2026Updated last month
microsoft / BitNet
View on GitHub
Official inference framework for 1-bit LLMs
☆39,765Updated this week
rtk-ai / rtk
View on GitHub
CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies
☆72,039Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
RightNow-AI / openfang
View on GitHub
Open-source Agent Operating System
☆18,038Jul 2, 2026Updated 2 weeks ago
openai / symphony
View on GitHub
Symphony turns project work into isolated, autonomous implementation runs, allowing teams to manage work instead of supervising coding ag…
☆26,060Updated this week
paperclipai / paperclip
View on GitHub
The open-source app everyone uses to manage agents at work
☆74,288Updated this week
nanocoai / nanoclaw
View on GitHub
A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and oth…
☆30,300Updated this week
upstash / context7
View on GitHub
Context7 Platform -- Up-to-date code documentation for LLMs and AI code editors
☆59,474Updated this week
algorithmicsuperintelligence / optillm
View on GitHub
Optimizing inference proxy for LLMs
☆4,179Updated this week
elder-plinius / OBLITERATUS
View on GitHub
OBLITERATE THE CHAINS THAT BIND YOU
☆7,027Jun 17, 2026Updated last month
gastownhall / beads
View on GitHub
Beads - A memory upgrade for your coding agent
☆25,450Updated this week
gastownhall / gastown
View on GitHub
Gas Town - multi-agent workspace manager
☆17,118Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
davebcn87 / pi-autoresearch
View on GitHub
Autonomous experiment loop extension for pi
☆7,215Updated this week
OpenHands / OpenHands
View on GitHub
🙌 OpenHands: AI-Driven Development
☆81,406Updated this week
alexzhang13 / rlm
View on GitHub
General plug-and-play inference library for Recursive Language Models (RLMs), supporting various sandboxes.
☆5,272Jun 26, 2026Updated 3 weeks ago
manaflow-ai / cmux
View on GitHub
Open source Ghostty-based macOS terminal with vertical tabs and notifications for AI coding agents. Built for multitasking, organization,…
☆24,843Updated this week
MemPalace / mempalace
View on GitHub
The best-benchmarked open-source AI memory system. And it's free.
☆57,506Updated this week
andrewyng / context-hub
View on GitHub
☆13,808May 31, 2026Updated last month
pydantic / monty
View on GitHub
A minimal, secure Python interpreter written in Rust for use by AI
☆7,906Updated this week