Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation metrics.
☆33Mar 20, 2025Updated last year
Alternatives and similar repositories for deception
Users that are interested in deception are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆35Mar 20, 2025Updated last year
- Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLM…☆87Dec 9, 2025Updated 6 months ago
- Documents the style side of the short-story Creative Writing LLM benchmark: we generated many short stories with a range of LLMs, then an…☆24Dec 18, 2025Updated 5 months ago
- Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a sm…☆71Apr 16, 2026Updated last month
- Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies a…☆41Apr 10, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.☆246Aug 7, 2025Updated 10 months ago
- Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick words☆228May 28, 2026Updated 2 weeks ago
- A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private co…☆302Jan 7, 2026Updated 5 months ago
- The BAZAAR challenges LLMs to navigate the double-auction marketplace, where buyers and sellers must make strategic decisions with incomp…☆37Jul 30, 2025Updated 10 months ago
- Vibe Styler is a Chrome Extension that can restyle any website with a simple prompt, powered by Google Gemini 2.5☆16Apr 9, 2025Updated last year
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, moti…☆388Updated this week
- A new benchmark for measuring LLM's capability to detect bugs in large codebase.☆33Jun 5, 2024Updated 2 years ago
- Blueprint by Mozilla.ai on how to transcribe audio files☆23Jun 13, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Experimental sampler to make LLMs more creative☆31Aug 2, 2023Updated 2 years ago
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆59Dec 1, 2024Updated last year
- workspace-cli is a Rust-based command-line tool designed to provide programmatic access to Google Workspace APIs with structured JSON out…☆37Mar 10, 2026Updated 3 months ago
- ☆23Oct 2, 2025Updated 8 months ago
- ☆15Jan 12, 2025Updated last year
- ☆13May 11, 2026Updated last month
- Comparison of gradient estimation techniques for black-box adversarial examples☆11Oct 31, 2018Updated 7 years ago
- DEF CON 31 AI Village - LLMs: Loose Lips Multipliers☆10Aug 16, 2023Updated 2 years ago
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13May 5, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Mastodoner is a command line tool (and Python library) for archiving Mastodon, a decentralized micro-blogging social network.☆14Oct 21, 2024Updated last year
- Idiomatic Python bindings for Google Go☆22Dec 11, 2019Updated 6 years ago
- Keras implementation of: Fitted Learning: Models with Awareness of their Limits☆13Mar 23, 2017Updated 9 years ago
- 🧬 Viral genome reference alignment☆12Jan 26, 2021Updated 5 years ago
- ☆12Nov 21, 2024Updated last year
- OpenClaw Operator gives coding agents like Codex and Claude Code the context and playbooks needed to set up, validate, and troubleshoot a…☆20Mar 7, 2026Updated 3 months ago
- Vulnerable Grails application☆43Jun 12, 2015Updated 11 years ago
- ☆15Feb 23, 2026Updated 3 months ago
- Code database for Fast Texform generation as proposed in the work of Deza, Chen, Long and Konkle (CCN 2019).☆12Jul 26, 2019Updated 6 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Hutter Prize Submission☆14Aug 9, 2021Updated 4 years ago
- Privacy-preserving Voice Analysis via Disentangled Representations☆12Aug 30, 2021Updated 4 years ago
- IA-powered Ollama Modelfile Generator☆27May 28, 2024Updated 2 years ago
- Encode/decode utf8 utf16 and utf32.☆14Oct 1, 2021Updated 4 years ago
- An empirical investigation of deep learning theory☆16Oct 3, 2019Updated 6 years ago
- ☆16Feb 9, 2024Updated 2 years ago
- Vulnerable Windows Driver with exploits which were used for demonstration purposes on Hunting and exploiting bugs in kernel drivers prese…☆13Jan 29, 2013Updated 13 years ago