babelcloud/LLM-RGB

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/babelcloud/LLM-RGB)

babelcloud / LLM-RGB

LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.

☆164

Alternatives and similar repositories for LLM-RGB

Users that are interested in LLM-RGB are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thirdgerb / ghost-in-shells
View on GitHub
WIP: project for engineering automatic bot (chatbot mainly)
☆13Sep 3, 2023Updated 2 years ago
LLM360 / k2-data-prep
View on GitHub
☆21Jun 4, 2024Updated 2 years ago
WHU-ZQH / DUP
View on GitHub
☆16Mar 6, 2025Updated last year
terryyz / llm-benchmark
View on GitHub
A list of LLM benchmark frameworks.
☆75Feb 17, 2024Updated 2 years ago
kevinyaobytedance / llm_eval
View on GitHub
LLM evaluation.
☆16Nov 7, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
boostcampaitech3 / final-project-level3-cv-17
View on GitHub
[2022.05.16 ~ 2022.06.10] 🌤️미세먼지 없는 맑은 사진📷 - 부스트캠프 AI Tech 3기 최종 프로젝트
☆14Jun 11, 2022Updated 4 years ago
LostCow / KLUE
View on GitHub
KLUE Benchmark 1st place (2021.12) solutions. (RE, MRC, NLI, STS, TC)
☆25Apr 11, 2022Updated 4 years ago
THUDM / AlignBench
View on GitHub
大模型多维度中文对齐评测基准 (ACL 2024)
☆430Oct 25, 2025Updated 9 months ago
novex-ai / parallel-parrot
View on GitHub
data prep utilities for LLMs, using LLMs
☆16Nov 7, 2023Updated 2 years ago
boostcampaitech2 / final-project-level3-nlp-08
View on GitHub
Look, Attend and Generate Poem - 사진을 보고 시를 써내려가는 감성시인 서비스
☆25Jan 20, 2022Updated 4 years ago
MetaCopilot / dseval
View on GitHub
☆33Jun 24, 2024Updated 2 years ago
DhruvAtreja / ALAS
View on GitHub
ALAS: Autonomous Learning Agent System
☆18Aug 14, 2025Updated 11 months ago
AIAnytime / Evaluation-of-LLMs-and-RAGs
View on GitHub
A complete guide to evaluate LLMs and RAGs. Both theory and code based approaches covered.
☆28Nov 16, 2023Updated 2 years ago
wandb / llm-leaderboard
View on GitHub
Project of llm evaluation to Japanese tasks
☆94Jul 15, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Knowledgator / unlimited_classifier
View on GitHub
Universal text classifier for generative models
☆25Jul 25, 2024Updated 2 years ago
yoonhero / jamo_llm
View on GitHub
어느 고등학생의 심플한 확률론적 앵무새 만들기
☆19Sep 2, 2023Updated 2 years ago
godaai / llm-inference
View on GitHub
Resources for Large Language Model Inference
☆17Dec 29, 2023Updated 2 years ago
IngestAI / deepmark
View on GitHub
Deepmark AI enables a unique testing environment for language models (LLM) assessment on task-specific metrics and on your own data so yo…
☆104Nov 24, 2023Updated 2 years ago
hist0613 / arxivbot
View on GitHub
☆61Jul 14, 2026Updated last week
advboxes / perceptron-benchmark
View on GitHub
Robustness benchmark for DNN models.
☆66Aug 8, 2022Updated 3 years ago
PatWie / polyglot_ls
View on GitHub
An LLM-based LS implementation that makes use of tree-sitter context to perform code actions
☆16Sep 6, 2024Updated last year
sparticleinc / ASEED
View on GitHub
Conversational Retrieval Evaluation Dataset
☆99Aug 19, 2025Updated 11 months ago
hyunwoongko / pydatrie
View on GitHub
Pure python implementation of DARTS (Double ARray Trie System)
☆24Dec 7, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
remichu-ai / gallama
View on GitHub
☆137Jun 30, 2026Updated 3 weeks ago
znfgnu / easy-agent
View on GitHub
Simple agent framework using Ollama tool calling
☆10Aug 27, 2024Updated last year
lucy3 / whos_filtered
View on GitHub
☆15Oct 4, 2024Updated last year
zaydzuhri / flame
View on GitHub
Fork of Flame repo for training of some new stuff in development
☆20Jul 15, 2026Updated last week
thu-coai / LongSafety
View on GitHub
[ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models
☆16Jun 18, 2025Updated last year
allenai / easy-to-hard-generalization
View on GitHub
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Jan 17, 2024Updated 2 years ago
JacobHuang91 / prompt-refiner
View on GitHub
🚀 Lightweight Python library for building production LLM applications with smart context management and automatic token optimization. Sa…
☆38Apr 12, 2026Updated 3 months ago
OthersideAI / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆12Nov 27, 2023Updated 2 years ago
replicate / cog-safe-push
View on GitHub
Safely push a Cog model version by making sure it works and is backwards-compatible with previous versions.
☆17Dec 4, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
IPRC-DIP / ANPL
View on GitHub
☆23Dec 7, 2023Updated 2 years ago
PeterGriffinJin / Graph-CoT
View on GitHub
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs (ACL 2024)
☆311Dec 24, 2024Updated last year
neo4j-product-examples / data-prep-sec-edgar
View on GitHub
Prepare SEC EDGAR data for working examples
☆20Feb 7, 2024Updated 2 years ago
ASGuard-UCI / MSF-ADV
View on GitHub
MSF-ADV is a novel physical-world adversarial attack method, which can fool the Multi Sensor Fusion (MSF) based autonomous driving (AD) p…
☆84Aug 4, 2021Updated 4 years ago
Marker-Inc-Korea / AutoRAG-example-korean-embedding-benchmark
View on GitHub
AutoRAG example about benchmarking Korean embeddings.
☆46Oct 2, 2024Updated last year
wenzhe-li / Self-MoA
View on GitHub
☆17Feb 4, 2025Updated last year
multimodal-art-projection / CodeCriticBench
View on GitHub
☆16Nov 1, 2025Updated 8 months ago