lechmazur / deceptionLinks

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation metrics.

☆28

Alternatives and similar repositories for deception

Users that are interested in deception are comparing it to the libraries listed below

Sorting:

lechmazur / divergent
LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…
☆31Updated 4 months ago
lechmazur / step_game
Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLM…
☆56Updated last week
lechmazur / nyt-connections
Benchmark that evaluates LLMs using 651 NYT Connections puzzles extended with extra trick words
☆130Updated last week
slashml / awesome-finetuning
☆28Updated 10 months ago
lechmazur / generalization
Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a sm…
☆60Updated last week
OpenPipe / rl-experiments
OpenPipe Reinforcement Learning Experiments
☆27Updated 4 months ago
Belluxx / LocalAIME
Test your local LLMs on the AIME problems
☆32Updated last month
severian42 / Computational-Model-for-Symbolic-Representations
Glyphs, acting as collaboratively defined symbols linking related concepts, add a layer of multidimensional semantic richness to user-AI …
☆49Updated 5 months ago
zenforic / csm-multi
Adding a multi-text multi-speaker script (diffe) that is based on a script from asiff00 on issue 61 for Sesame: A Conversational Speech G…
☆23Updated 3 months ago
JD-P / RetroInstruct
Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.
☆32Updated 4 months ago
the-crypt-keeper / fragment-frog
🐸 A non-linear AI-enabled text editing environment
☆11Updated 6 months ago
d0rc / deepdive
Conduct in-depth research with AI-driven insights : DeepDive is a command-line tool that leverages web searches and AI models to generate…
☆42Updated 10 months ago
tegridydev / abstract-agent
Locally hosted AI Agent Python Tool To Generate Novel Research Hypothesis + Titles + Abstracts
☆25Updated 2 months ago
kubernetes-bad / reward-composer
Lego for GRPO
☆28Updated last month
Alignment-Lab-AI / KnowledgeBase
never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…
☆37Updated last year
catena-labs / moa-llm
A Python library to orchestrate LLMs in a neural network-inspired structure
☆49Updated 9 months ago
slashml / awesome-small-language-models
☆41Updated 10 months ago
NolanoOrg / SpectraSuite
☆49Updated last year
attashe / ModifiedBeamSampler
Modified Beam Search with periodical restart
☆12Updated 10 months ago
mounta11n / plusplus-camall
After my server ui improvements were successfully merged, consider this repo a playground for experimenting, tinkering and hacking around…
☆54Updated 11 months ago
syv-ai / OpenArena
☆14Updated last year
TC-Zheng / ActuosusAI
AI management tool
☆118Updated 8 months ago
astramind-ai / Pulsar
The hearth of The Pulsar App, fast, secure and shared inference with modern UI
☆57Updated 7 months ago
Mihaiii / backtrack_sampler
An easy-to-understand framework for LLM samplers that rewind and revise generated tokens
☆140Updated 5 months ago
AndrewVeee / assistant-demo
Demo of an "always-on" AI assistant.
☆24Updated last year
Forest-Person / smolResearcher
Use smol agents to do research and then update csv coumns with its findings.
☆41Updated 5 months ago
lechmazur / confabulations
Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.
☆186Updated last week
mgerstgrasser / tacheles
a lightweight, open-source blueprint for building powerful and scalable LLM chat applications
☆28Updated last year
severian42 / Proteus-The-Genesis-LLM
Proteus is an experimental platform that combines the power of Large Language Models with the Genesis physics engine
☆23Updated 7 months ago
SicariusSicariiStuff / SLOP_Detector
SLOP Detector and analyzer based on dictionary for shareGPT JSON and text
☆72Updated 8 months ago