A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other
☆302Jan 7, 2026Updated 4 months ago
Alternatives and similar repositories for elimination_game
Users that are interested in elimination_game are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLM…☆86Dec 9, 2025Updated 5 months ago
- Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick words☆223May 1, 2026Updated last week
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆35Mar 20, 2025Updated last year
- Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claud…☆33Mar 20, 2025Updated last year
- Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies a…☆40Apr 10, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, moti…☆372Apr 29, 2026Updated last week
- The BAZAAR challenges LLMs to navigate the double-auction marketplace, where buyers and sellers must make strategic decisions with incomp…☆37Jul 30, 2025Updated 9 months ago
- Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.☆246Aug 7, 2025Updated 9 months ago
- ☆136May 2, 2025Updated last year
- Animal Harm Assessment public repository☆12Updated this week
- Cell type annotation with local Large Language Models (LLMs) - Ensuring privacy and speed with extensive customized reports☆153Oct 25, 2024Updated last year
- ☆166Mar 24, 2025Updated last year
- Mine-tuning is a methodology for synchronizing human and AI attention.☆20Jun 16, 2024Updated last year
- a pure lazy functional programming language to make ASCII art animations (and other things too)☆45Apr 28, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Real-time webcam demo with SmolVLM(mlx-community/SmolVLM-Instruct-4bit) and MLX-VLM☆26Jun 12, 2025Updated 10 months ago
- Documents the style side of the short-story Creative Writing LLM benchmark: we generated many short stories with a range of LLMs, then an…☆24Dec 18, 2025Updated 4 months ago
- A simple external application for Windows that allows you to scan an existing custom_nodes directory and generate a list of the nodes ins…☆20Jul 6, 2025Updated 10 months ago
- state of the art browsing agent (WebArena 72.7%)☆367Oct 2, 2025Updated 7 months ago
- ValTown MCP Server - Execute ValTown functions from AI assistants☆15Aug 12, 2025Updated 8 months ago
- ☆14Jan 16, 2025Updated last year
- Files related to PoC||GTFO 21:21 - NSA’s Backdoor of the PX1000-Cr☆17Mar 23, 2022Updated 4 years ago
- A way to remotely switch Steam users using HomeKit☆41Dec 3, 2025Updated 5 months ago
- A browser extension that demos Gemini Nano via window.ai and Cartesia TTS ⚡️☆38Jul 10, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Fun with wgpu: Simulating slime mold☆24Aug 22, 2024Updated last year
- Organise your AI's memories with graph database entries☆79Dec 2, 2025Updated 5 months ago
- Open benchmarks for evaluating search APIs☆102Mar 23, 2026Updated last month
- A benchmark for emotional intelligence in large language models☆424Jul 26, 2024Updated last year
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆319Jun 26, 2025Updated 10 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Nov 11, 2024Updated last year
- ☆47Dec 2, 2025Updated 5 months ago
- Give your AI coding assistants access to Raygun so they can investigate, explain, and help resolve errors for you.☆20Mar 2, 2026Updated 2 months ago
- ☆41Feb 5, 2026Updated 3 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Run Orpheus 3B Locally with Gradio UI, Standalone App☆24Apr 1, 2025Updated last year
- 🧜♀️ Pi extension that renders Mermaid diagrams as ASCII in the TUI, with width-aware output and safe handling for larger diagrams.☆54Feb 23, 2026Updated 2 months ago
- A macOS AppleScript MCP server☆382Apr 19, 2025Updated last year
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Apr 9, 2025Updated last year
- Vibe Styler is a Chrome Extension that can restyle any website with a simple prompt, powered by Google Gemini 2.5☆16Apr 9, 2025Updated last year
- A fairly lightweight daemon that keeps your computer awake. Designed for rootless environments.☆24May 3, 2019Updated 7 years ago
- 🐇A rabbit-fast Rust reimplementation inspired by Claude Code, with native TUI, deeper tooling, and a cleaner path for terminal-first AI …☆44Apr 9, 2026Updated 3 weeks ago