wandb/llm-leaderboard

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wandb/llm-leaderboard)

wandb / llm-leaderboard

Project of llm evaluation to Japanese tasks

☆94

Alternatives and similar repositories for llm-leaderboard

Users that are interested in llm-leaderboard are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

wandb / llm-kr-eval
View on GitHub
☆20Jul 24, 2024Updated last year
llm-jp / llm-jp-eval
View on GitHub
☆155Apr 28, 2026Updated last month
llm-jp / llm-jp-sft
View on GitHub
☆62Jun 13, 2024Updated last year
swallow-llm / swallow-evaluation
View on GitHub
Swallowプロジェクト大規模言語モデル評価スクリプト
☆24Sep 17, 2025Updated 8 months ago
ku-nlp / ja-vicuna-qa-benchmark
View on GitHub
☆33Jul 31, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Stability-AI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of autoregressive language models.
☆154Sep 13, 2024Updated last year
kunishou / do-not-answer-ja
View on GitHub
☆24Dec 15, 2023Updated 2 years ago
osekilab / JCoLA
View on GitHub
☆19Apr 21, 2026Updated last month
yahoojapan / JGLUE
View on GitHub
JGLUE: Japanese General Language Understanding Evaluation
☆342Mar 31, 2025Updated last year
nlp-waseda / JMMLU
View on GitHub
日本語マルチタスク言語理解ベンチマーク Japanese Massive Multitask Language Understanding Benchmark
☆40Oct 7, 2025Updated 7 months ago
HojiChar / HojiChar
View on GitHub
The robust text processing pipeline framework enabling customizable, efficient, and metric-logged text preprocessing.
☆126Apr 10, 2026Updated last month
llm-jp / llm-jp-eval-mm
View on GitHub
A lightweight framework for evaluating visual-language models.
☆41Apr 20, 2026Updated last month
LG-AI-EXAONE / KoMT-Bench
View on GitHub
Official repository for KoMT-Bench built by LG AI Research
☆73Aug 8, 2024Updated last year
nobu-g / JGLUE-evaluation-scripts
View on GitHub
Training and evaluation scripts for JGLUE, a Japanese language understanding benchmark
☆18May 20, 2026Updated last week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
teddysum / korean_evaluation
View on GitHub
☆10Jun 5, 2025Updated 11 months ago
llm-jp / awesome-japanese-llm
View on GitHub
日本語LLMまとめ - Overview of Japanese LLMs
☆1,400May 20, 2026Updated last week
turingmotors / vlm-recipes
View on GitHub
☆20Aug 28, 2024Updated last year
kakao / FunctionChat-Bench
View on GitHub
☆116Feb 25, 2026Updated 3 months ago
J-Seo / KoCommonGEN-V2
View on GitHub
KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models
☆25Aug 24, 2024Updated last year
deep-diver / hllama
View on GitHub
hllama is a library which aims to provide a set of utility tools for large language models.
☆10Apr 16, 2024Updated 2 years ago
hitachi-nlp / FLD-corpus
View on GitHub
☆19Dec 6, 2024Updated last year
corca-ai / evaluating-gpt-4o-on-CLIcK
View on GitHub
Evaluate gpt-4o on CLIcK (Korean NLP Dataset)
☆20May 18, 2024Updated 2 years ago
Stability-AI / FastChat
View on GitHub
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
☆51Jul 5, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
verypluming / JSICK
View on GitHub
Repository for JSICK
☆46May 31, 2023Updated 2 years ago
jqk09a / japanese-daily-dialogue
View on GitHub
☆56Mar 17, 2023Updated 3 years ago
Stability-AI / gpt-neox
View on GitHub
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
☆13Jun 7, 2023Updated 2 years ago
ku-nlp / bertknp
View on GitHub
A Japanese dependency parser based on BERT
☆23Oct 26, 2022Updated 3 years ago
shisa-ai / shaberi
View on GitHub
Lightblue LLM Eval Framework: tengu, elyza100, ja-mtbench, rakuda
☆18Apr 29, 2026Updated last month
kunishou / oasst1-89k-ja
View on GitHub
☆16Nov 19, 2023Updated 2 years ago
Marker-Inc-Korea / KoLLM_Eval
View on GitHub
한국어 벤치마크 평가 코드 통합본(?)
☆21Nov 15, 2024Updated last year
sbintuitions / flexeval
View on GitHub
Flexible evaluation tool for language models
☆59May 21, 2026Updated last week
WorksApplications / uzushio
View on GitHub
☆24Mar 18, 2026Updated 2 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
sionic-ai / Llama4-Token-Editor
View on GitHub
☆64Jul 21, 2025Updated 10 months ago
masanorihirano / llm-japanese-dataset
View on GitHub
LLM構築用の日本語チャットデータセット
☆87Jan 23, 2024Updated 2 years ago
yuzu-ai / japanese-llm-ranking
View on GitHub
☆50Apr 10, 2024Updated 2 years ago
AkimfromParis / RAG-Japanese
View on GitHub
Open source RAG with Llama Index for Japanese LLM in low resource settting
☆10May 12, 2025Updated last year
davidkim205 / kollm_evaluation
View on GitHub
자체 구축한 한국어 평가 데이터셋을 이용한 한국어 모델 평가
☆31May 31, 2024Updated last year
leia-llm / leia
View on GitHub
LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation
☆23Apr 24, 2024Updated 2 years ago
insoochung / transformer_bcq
View on GitHub
BCQ tutorial for transformers
☆16Jul 17, 2023Updated 2 years ago