A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other
☆301Jan 7, 2026Updated 2 months ago
Alternatives and similar repositories for elimination_game
Users that are interested in elimination_game are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLM…☆84Dec 9, 2025Updated 3 months ago
- Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick words☆199Mar 6, 2026Updated 3 weeks ago
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆34Mar 20, 2025Updated last year
- Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claud…☆31Mar 20, 2025Updated last year
- Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies a…☆39Apr 10, 2025Updated 11 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, moti…☆357Feb 6, 2026Updated last month
- The BAZAAR challenges LLMs to navigate the double-auction marketplace, where buyers and sellers must make strategic decisions with incomp…☆35Jul 30, 2025Updated 7 months ago
- Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a sm…☆64Mar 16, 2026Updated last week
- Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.☆243Aug 7, 2025Updated 7 months ago
- ☆135May 2, 2025Updated 10 months ago
- Estimate the number of legal chess positions☆13Jan 14, 2021Updated 5 years ago
- A benchmark for conversational bargaining by language models. In each 20‑round match one LLM plays buyer, one plays seller, and both hold…☆33Aug 21, 2025Updated 7 months ago
- Collaborative AI Model☆11Nov 27, 2024Updated last year
- Cell type annotation with local Large Language Models (LLMs) - Ensuring privacy and speed with extensive customized reports☆153Oct 25, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- A toy Inspect implementation of the Bliss Attractor eval from Claude 4 System Card Welfare Assessment☆38Jun 5, 2025Updated 9 months ago
- ☆200May 5, 2025Updated 10 months ago
- a pure lazy functional programming language to make ASCII art animations (and other things too)☆43Mar 14, 2026Updated 2 weeks ago
- ☆165Mar 24, 2025Updated last year
- Mine-tuning is a methodology for synchronizing human and AI attention.☆19Jun 16, 2024Updated last year
- ☆12Oct 5, 2024Updated last year
- state of the art browsing agent (WebArena 72.7%)☆366Oct 2, 2025Updated 5 months ago
- A utility that uses Whisper to transcribe videos and various translation APIs to translate the transcribed text and save them as SRT (sub…☆74Aug 30, 2024Updated last year
- Documents the style side of the short-story Creative Writing LLM benchmark: we generated many short stories with a range of LLMs, then an…☆22Dec 18, 2025Updated 3 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A simple external application for Windows that allows you to scan an existing custom_nodes directory and generate a list of the nodes ins…☆20Jul 6, 2025Updated 8 months ago
- ☆48May 27, 2025Updated 10 months ago
- Craziness.☆29Feb 10, 2025Updated last year
- llms can learn their own context compression via RL☆42Nov 26, 2025Updated 4 months ago
- ☆14Jan 16, 2025Updated last year
- A way to remotely switch Steam users using HomeKit☆41Dec 3, 2025Updated 3 months ago
- hackernews data☆34Dec 14, 2025Updated 3 months ago
- The easiest possible implementation of an MCP server and client. Set up a server or a client in 2 lines of code.☆23Jul 5, 2025Updated 8 months ago
- Organise your AI's memories with graph database entries☆77Dec 2, 2025Updated 3 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Beating the `bisect` module's implementation using C-extensions.☆32May 19, 2023Updated 2 years ago
- ☆12Jan 19, 2024Updated 2 years ago
- A benchmark for emotional intelligence in large language models☆417Jul 26, 2024Updated last year
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Nov 11, 2024Updated last year
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆318Jun 26, 2025Updated 9 months ago
- A GTK graphical interface for chatting with large language models (LLMs)☆84Dec 15, 2025Updated 3 months ago
- LisanBench is a lightweight benchmark for LLMs that stresses forward planning, vocabulary depth, constraint adherence, attention, and lon…☆32Jun 1, 2025Updated 9 months ago