Orolol/familyBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Orolol/familyBench)

Orolol / familyBench

FamilyBench evaluation tool for testing the relational reasoning capabilities of Large Language Models (LLMs).

☆47

Alternatives and similar repositories for familyBench

Users that are interested in familyBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

johnbean393 / SVGBench
View on GitHub
SVGBench: A challenging LLM benchmark that tests knowledge, coding, physical reasoning capabilities of LLMs.
☆73Feb 12, 2026Updated 5 months ago
ThirdKeyAI / SchemaPin
View on GitHub
The SchemaPin protocol for cryptographically signing and verifying AI agent tool schemas to prevent supply-chain attacks.
☆16Jun 25, 2026Updated last month
lechmazur / confabulations
View on GitHub
Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.
☆247Aug 7, 2025Updated 11 months ago
lechmazur / generalization
View on GitHub
Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a sm…
☆72Apr 16, 2026Updated 3 months ago
usrname0 / YaGGUF
View on GitHub
A user-friendly GUI for llama.cpp — convert, quantize, and run GGUF models without touching the terminal.
☆21Jun 9, 2026Updated last month
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
kallewoof / gguf-eval
View on GitHub
Evaluation framework for GGUF
☆15Apr 2, 2026Updated 3 months ago
MockLoop / mockloop-mcp
View on GitHub
Intelligent Model Context Protocol (MCP) server for AI-assisted API development. Generate mock servers from OpenAPI specs with advanced l…
☆16Updated this week
lechmazur / nyt-connections
View on GitHub
Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick words
☆230Updated this week
hemanth / mcp-web-client
View on GitHub
A web-based client for connecting to MCP servers with OAuth support
☆17Jul 16, 2026Updated last week
Pyligent / Finance-Assistant-with-MCP-and-Langchain
View on GitHub
a conversational finance assistant that provides users with real-time stock quotes, market news, and insights on market movers through na…
☆17Apr 26, 2025Updated last year
ml-research / ActivationReasoning
View on GitHub
☆15May 21, 2026Updated 2 months ago
PhialsBasement / GUI-MCP
View on GitHub
A Blueprint-style visual node editor for creating FastMCP servers. Build MCP tools, resources, and prompts by connecting nodes - no codin…
☆25Dec 8, 2025Updated 7 months ago
AIAnytime / SLIM-Models-by-LLMWare
View on GitHub
SLIM Models by LLMWare. A streamlit app showing the capabilities for AI Agents and Function Calls.
☆21Feb 11, 2024Updated 2 years ago
thekingofkings / chicago-crime
View on GitHub
Crime correlation anaysis
☆10Aug 8, 2018Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ikiruneo / millionaire-bench
View on GitHub
German "Who Wants To Be A Millionaire" LLM Benchmarking.
☆50Updated this week
Oqura-ai / deepresearch-datagen-cli
View on GitHub
Using deep research workflow to generate datasets for finetuning LLMs.
☆40Oct 9, 2025Updated 9 months ago
SpillwaveSolutions / architect-agent
View on GitHub
architect-agent
☆19Dec 31, 2025Updated 6 months ago
aws-samples / sample-agentic-ai-factory
View on GitHub
☆16Jun 30, 2026Updated 3 weeks ago
fairydreaming / lineage-bench
View on GitHub
Testing LLM reasoning abilities with lineage relationship quizzes.
☆44Mar 10, 2026Updated 4 months ago
neosun100 / supertonic-tts-enhanced
View on GitHub
Enhanced Supertonic TTS with Docker, FastAPI, Web UI, and comprehensive API documentation
☆21Dec 7, 2025Updated 7 months ago
worldbank / metadata-editor-docs
View on GitHub
Metadata Editor user and practice guide
☆19Jul 9, 2026Updated 2 weeks ago
samuel-vitorino / MovieSearch
View on GitHub
Search movies using RAG and LLMs
☆19Sep 4, 2024Updated last year
remichu-ai / gallamaUI
View on GitHub
☆23May 14, 2026Updated 2 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
crashr / llama-stream
View on GitHub
☆24Feb 17, 2026Updated 5 months ago
aws-samples / sample-connected-mobility-solution-on-aws
View on GitHub
Accelerate development and deployment of connected vehicle assets with purpose-built, deployment-ready accelerators.
☆24Jul 30, 2025Updated 11 months ago
jonasfrey / gpu-monitor-browser-gui
View on GitHub
a browser gui for nvidia smi
☆21Mar 17, 2025Updated last year
sparkle-reasoning / sparkle
View on GitHub
[NeurIPS'25] Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
☆16Dec 12, 2025Updated 7 months ago
vianarafael / codechrono
View on GitHub
helps you estimate how long software tasks will take
☆23May 11, 2025Updated last year
ButterflyRSI / Butterfly-RSI
View on GitHub
Recursive self-correcting intelligence framework
☆17Nov 13, 2025Updated 8 months ago
chrisdunlopnz / meetingprompts
View on GitHub
☆24Sep 5, 2025Updated 10 months ago
themanaworld / evol-hercules
View on GitHub
Mirror from gitlab
☆11Jan 9, 2021Updated 5 years ago
lechmazur / step_game
View on GitHub
Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLM…
☆89Dec 9, 2025Updated 7 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
Cubified / termgl
View on GitHub
A terminal-based renderer for OpenGL shaders. Like Shadertoy, but in the terminal.
☆12Sep 24, 2023Updated 2 years ago
2501Pr0ject / RAGnarok-AI
View on GitHub
Local-first RAG evaluation framework for LLM applications. 100% local, no API keys required.
☆16Apr 18, 2026Updated 3 months ago
Tencent-Hunyuan / Hunyuan-4B
View on GitHub
☆16Aug 5, 2025Updated 11 months ago
Vesely / skills
View on GitHub
My personal collection of Claude Code skills
☆27Updated this week
TesslateAI / Agent-Builder
View on GitHub
☆214Sep 7, 2025Updated 10 months ago
lorenzodimauro97 / FileCollector
View on GitHub
☆19Aug 19, 2025Updated 11 months ago
snaga / pgxl-deployment-tools
View on GitHub
Postgres-XL Deployment Tools
☆18Jun 10, 2014Updated 12 years ago