johnbean393/SVGBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/johnbean393/SVGBench)

johnbean393 / SVGBench

SVGBench: A challenging LLM benchmark that tests knowledge, coding, physical reasoning capabilities of LLMs.

☆73

Alternatives and similar repositories for SVGBench

Users that are interested in SVGBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Orolol / familyBench
View on GitHub
FamilyBench evaluation tool for testing the relational reasoning capabilities of Large Language Models (LLMs).
☆47May 4, 2026Updated 2 months ago
aishvar / gamebench
View on GitHub
☆19Jan 10, 2026Updated 6 months ago
lechmazur / generalization
View on GitHub
Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a sm…
☆72Apr 16, 2026Updated 3 months ago
alientony / Split-brain
View on GitHub
This is a training method to produce a split brain model
☆14Mar 7, 2025Updated last year
severian42 / SIREN
View on GitHub
A Field-Theoretic Approach to Unbounded Memory in Large Language Models
☆20Apr 15, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
lechmazur / pgg_bench
View on GitHub
Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies a…
☆41Apr 10, 2025Updated last year
yhenon / llm-face-vision
View on GitHub
Benchmarking vision language vision on face tasks
☆16Mar 30, 2025Updated last year
oumi-ai / halloumi-demo
View on GitHub
Try out HallOumi, a state-of-the-art claim verification model in a simple UI!
☆41Apr 2, 2025Updated last year
fajrmn / kokoro-on-browser
View on GitHub
☆16Feb 1, 2025Updated last year
lechmazur / pact
View on GitHub
A benchmark for conversational bargaining by language models. In each 20‑round match one LLM plays buyer, one plays seller, and both hold…
☆44Jun 23, 2026Updated last month
lechmazur / nyt-connections
View on GitHub
Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick words
☆230Updated this week
hyperfocAIs / Attend
View on GitHub
Attend - to what matters.
☆17Feb 22, 2025Updated last year
Sense-GVT / unig3d_point-e
View on GitHub
☆15May 17, 2023Updated 3 years ago
sgasser / diffshot-ai
View on GitHub
See how code changes affect your UI across viewports, themes, and languages
☆15Jul 17, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
squest / zenx-integrated-learning
View on GitHub
Learning problem-solving, logic/set, math, physics, economics through functional programming using Haskell
☆19Oct 16, 2015Updated 10 years ago
jd-3d / SOLOBench
View on GitHub
☆136May 2, 2025Updated last year
dceluis / ln-diff
View on GitHub
Line-numbered patch format. Non-sequential, llm and stream-friendly
☆15Nov 7, 2024Updated last year
XueruiSu / Trust-Region-Preference-Approximation
View on GitHub
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
☆15Jun 28, 2025Updated last year
Zechao-Guan / TopoDiT-3D
View on GitHub
☆15May 13, 2025Updated last year
adobe-research / NoLiMa
View on GitHub
Official repository for "NoLiMa: Long-Context Evaluation Beyond Literal Matching"
☆202Jul 17, 2025Updated last year
bernardolsp / open-webui-agent-function
View on GitHub
function for agents in OpenWebUI
☆17Jun 15, 2025Updated last year
summersonnn / reddit_analyzer
View on GitHub
Analyze Reddit posts
☆32Jun 5, 2026Updated last month
LanDiff / LanDiff
View on GitHub
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation
☆41May 4, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
lechmazur / step_game
View on GitHub
Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLM…
☆89Dec 9, 2025Updated 7 months ago
GVDub / panai-seed-node
View on GitHub
“A locally hosted, memory-aware AI microservice—designed for cultural continuity, decentralized intelligence, and ethical autonomy.”
☆27May 1, 2025Updated last year
tkalevra / FaultLine
View on GitHub
Validated, private, shareable knowledge-graph memory for AI — per-tenant, write-gated, PostgreSQL-authoritative, served over MCP.
☆18Jul 19, 2026Updated last week
ButterflyRSI / Butterfly-RSI
View on GitHub
Recursive self-correcting intelligence framework
☆17Nov 13, 2025Updated 8 months ago
sukanto-m / directory-monitor
View on GitHub
☆16Oct 28, 2025Updated 9 months ago
haoxiongliu / ProofAug
View on GitHub
"Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis" (ICML 2025) official implementation.
☆16Jun 8, 2025Updated last year
simple-bench / SimpleBench
View on GitHub
☆212Dec 20, 2024Updated last year
yuanpengtu / VideoAnydoor
View on GitHub
[SIGGRAGH'25] Official repository of VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
☆34Dec 5, 2025Updated 7 months ago
lechmazur / deception
View on GitHub
Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claud…
☆33Mar 20, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
open-webui / benchmark
View on GitHub
☆25Jan 28, 2026Updated 6 months ago
Miracle-Messi / Isa-AutoFormal
View on GitHub
☆17Oct 27, 2024Updated last year
get-convex / ai-world-fair
View on GitHub
☆12Jun 1, 2026Updated last month
odino / touchy
View on GitHub
Remote controlling your keyboard since 2016.
☆10Jun 16, 2016Updated 10 years ago
YuantianDing / HilbertProver
View on GitHub
An Automatic Theorem Prover for Hilbert System, generating nearly-minimal proofs.
☆14Jan 21, 2025Updated last year
Graphic-Kiliani / Tri2Quad-Geometry-Aware-Triangle-to-Quad-Mesh-Conversion-Operator
View on GitHub
Developed a high-performance triangle-to-quad conversion operator, formulated as a maximum-weight matching problem on the triangle adjace…
☆17May 28, 2026Updated 2 months ago
tanavc1 / local-llm-autotune
View on GitHub
Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, …
☆32Jun 30, 2026Updated 3 weeks ago