☆136May 2, 2025Updated 10 months ago
Alternatives and similar repositories for SOLOBench
Users that are interested in SOLOBench are comparing it to the libraries listed below
Sorting:
- Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a sm…☆63Sep 22, 2025Updated 5 months ago
- Real-time webcam demo with SmolVLM(mlx-community/SmolVLM-Instruct-4bit) and MLX-VLM☆25Jun 12, 2025Updated 8 months ago
- A fairly lightweight daemon that keeps your computer awake. Designed for rootless environments.☆23May 3, 2019Updated 6 years ago
- documentation used in my projects☆16Updated this week
- A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private co…☆300Jan 7, 2026Updated last month
- A toy Inspect implementation of the Bliss Attractor eval from Claude 4 System Card Welfare Assessment☆38Jun 5, 2025Updated 9 months ago
- Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLM…☆85Dec 9, 2025Updated 2 months ago
- A benchmark for emotional intelligence in large language models☆405Jul 26, 2024Updated last year
- Official Repo for SwS: A Weakness-driven Problem Synthesis Framework in RL for LLM Reasoning☆42Nov 11, 2025Updated 3 months ago
- Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search …☆51Feb 10, 2026Updated 3 weeks ago
- ☆17Aug 5, 2025Updated 7 months ago
- The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]☆34Sep 1, 2025Updated 6 months ago
- German "Who Wants To Be A Millionaire" LLM Benchmarking.☆47Feb 26, 2026Updated last week
- FamilyBench evaluation tool for testing the relational reasoning capabilities of Large Language Models (LLMs).