lechmazur / elimination_gameView external linksLinks
A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other
☆298Jan 7, 2026Updated last month
Alternatives and similar repositories for elimination_game
Users that are interested in elimination_game are comparing it to the libraries listed below
Sorting:
- Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLM…☆85Dec 9, 2025Updated 2 months ago
- Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick words☆193Feb 6, 2026Updated last week
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆35Mar 20, 2025Updated 10 months ago
- This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, moti…☆342Feb 6, 2026Updated last week
- Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies a…☆39Apr 10, 2025Updated 10 months ago
- A toy Inspect implementation of the Bliss Attractor eval from Claude 4 System Card Welfare Assessment☆38Jun 5, 2025Updated 8 months ago
- Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a sm…☆63Sep 22, 2025Updated 4 months ago
- A simple external application for Windows that allows you to scan an existing custom_nodes directory and generate a list of the nodes ins…☆20Jul 6, 2025Updated 7 months ago
- Real-time webcam demo with SmolVLM(mlx-community/SmolVLM-Instruct-4bit) and MLX-VLM☆25Jun 12, 2025Updated 8 months ago
- Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.☆243Aug 7, 2025Updated 6 months ago
- A fairly lightweight daemon that keeps your computer awake. Designed for rootless environments.☆23May 3, 2019Updated 6 years ago
- The easiest possible implementation of an MCP server and client. Set up a server or a client in 2 lines of code.☆23Jul 5, 2025Updated 7 months ago
- hackernews data☆33Dec 14, 2025Updated 2 months ago
- state of the art browsing agent (WebArena 72.7%)☆364Oct 2, 2025Updated 4 months ago
- ☆200May 5, 2025Updated 9 months ago
- Cell type annotation with local Large Language Models (LLMs) - Ensuring privacy and speed with extensive customized reports☆152Oct 25, 2024Updated last year
- ☆164Mar 24, 2025Updated 10 months ago
- A holistic benchmark for LLM abstention☆69Aug 27, 2025Updated 5 months ago
- Fun with wgpu: Simulating slime mold☆24Aug 22, 2024Updated last year
- A JPEG Image Compression Service using Part Homomorphic Encryption.☆31Mar 7, 2025Updated 11 months ago
- pano date resolver stuff☆17Dec 3, 2025Updated 2 months ago
- ☆17Aug 5, 2025Updated 6 months ago
- ValTown MCP Server - Execute ValTown functions from AI assistants☆15Aug 12, 2025Updated 6 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Nov 11, 2024Updated last year
- A browser extension that demos Gemini Nano via window.ai and Cartesia TTS ⚡️☆38Jul 10, 2024Updated last year
- A central registry and HTTP interface for coordinating Model Context Protocol (MCP) servers.☆34Dec 29, 2024Updated last year
- A way to remotely switch Steam users using HomeKit☆41Dec 3, 2025Updated 2 months ago
- Tutorial for TikZ☆11Apr 3, 2025Updated 10 months ago
- Code for paper "Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs"☆12Jun 11, 2025Updated 8 months ago
- ☆16Jul 1, 2025Updated 7 months ago
- A wrapper around libssh2 for .NET☆29Jan 21, 2026Updated 3 weeks ago
- Repository of papers released by Modulus Labs☆13Mar 13, 2024Updated last year
- 🕷️ n8n Community Node for Scrappey API – Automate web scraping and data extraction with advanced anti-bot blocking technology, seamlessl…☆16Feb 2, 2026Updated 2 weeks ago
- ☆11Jan 19, 2024Updated 2 years ago
- Sonos server for my 5 year old to control his speaker using an esp32s3 M5Stack CardPuter☆24Sep 1, 2025Updated 5 months ago
- A simple, interactive web tool to compare pricing and performance metrics of various AI models.☆16Dec 20, 2025Updated last month
- SING: SDE Inference via Natural Gradients☆36Dec 9, 2025Updated 2 months ago
- A benchmark for emotional intelligence in large language models☆400Jul 26, 2024Updated last year
- Craziness.☆29Feb 10, 2025Updated last year