Lightblue LLM Eval Framework: tengu, elyza100, ja-mtbench, rakuda
☆18Jan 6, 2026Updated 2 months ago
Alternatives and similar repositories for shaberi
Users that are interested in shaberi are comparing it to the libraries listed below
Sorting:
- AJIMEE-Bench (Advanced Japanese IME Evaluation Benchmark)☆18Jan 13, 2025Updated last year
- ☆17May 31, 2023Updated 2 years ago
- Latest version of MedEX/J (Japanese disease name extractor)☆18May 17, 2022Updated 3 years ago
- ☆19May 23, 2024Updated last year
- Swallowプロジェクト 大規模言語モデル 評価スクリプト☆24Sep 17, 2025Updated 5 months ago
- ☆30Jun 3, 2024Updated last year
- NoMIRACL: A multilingual hallucination evaluation dataset to evaluate LLM robustness in RAG against first-stage retrieval errors on 18 la…☆26Nov 29, 2024Updated last year
- Benchmark for Japanese document embedding & vector search☆29Mar 12, 2024Updated last year
- ☆24Dec 15, 2023Updated 2 years ago
- NLP 100 Exercise 2025☆40Apr 9, 2025Updated 11 months ago
- A lightweight framework for evaluating visual-language models.☆41Jan 16, 2026Updated last month
- 日本語マルチタスク言語理解ベンチマーク Japanese Massive Multitask Language Understanding Benchmark☆38Oct 7, 2025Updated 5 months ago
- Logical inference system based on event semantics and degree semantics in formal semantics☆11Jan 22, 2023Updated 3 years ago
- LLM構築用の日本語チャットデータセット☆88Jan 23, 2024Updated 2 years ago
- JQaRA: Japanese Question Answering with Retrieval Augmentation - 検索拡張(RAG)評価のための日本語Q&Aデータセット☆43Sep 9, 2025Updated 6 months ago
- ☆35Aug 4, 2021Updated 4 years ago
- 📚 My personal website☆10Jun 2, 2025Updated 9 months ago
- JSAI2019でのチュートリアル講演 「オントロジー工学に基づくセマンティック技術」の資料公開用☆12Jun 7, 2019Updated 6 years ago
- A library for evaluation of Grammatical Error Correction (GEC). Accepted to ACL'25 Demo: "gec-metrics: A Unified Library for Grammatical …☆14Jan 25, 2026Updated last month
- Evaluation Pipeline for medical tasks.☆12Feb 13, 2026Updated 3 weeks ago
- PPH in C☆24Nov 21, 2025Updated 3 months ago
- ☆13May 11, 2021Updated 4 years ago
- WaPENの文法をPythonっぽくしたもの☆14Updated this week
- Project of llm evaluation to Japanese tasks☆92Feb 4, 2026Updated last month
- Regex base tail written in Rust☆10Mar 20, 2023Updated 2 years ago
- Code for the paper "Modeling Information Change in Science Communication with Semantically Matched Paraphrases" from EMNLP 2022☆13Oct 20, 2022Updated 3 years ago
- Benchmarks for Evaluating Spanish Language Models☆11Jun 14, 2023Updated 2 years ago
- Arduino based BBQ Thermometer utilizing a Maverick BBQ Remote Thermometer☆10Jan 12, 2015Updated 11 years ago
- msglm makes it a little easier to create messages for language models like Claude and OpenAI GPTs.☆14Jan 29, 2026Updated last month
- ☆10Sep 14, 2022Updated 3 years ago
- Pytorch implementation of deep fill v2 (original by Jiayu et al.)☆10Jun 26, 2019Updated 6 years ago
- code and dataset of EMNLP 2020 paper "PARADE: A New Dataset for Paraphrase Identification Requiring Computer Science Domain Knowledge"☆12Nov 6, 2020Updated 5 years ago
- A shareable Renovate config for Cybozu☆11Updated this week
- Metal on Symbol☆12Mar 4, 2024Updated 2 years ago
- Ruby on Rails app. untuk pencatatan data perhari berkenaan tentang "Situasi COVID-19" pada website KemKes.☆11Mar 8, 2023Updated 3 years ago
- Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"☆13Feb 14, 2022Updated 4 years ago
- Transparent Reporting of Ethics for Generative AI (TREGAI) Checklist☆15Oct 16, 2024Updated last year
- A methodology designed to measure the contribution of the features to the predictive performance of any econometric or machine learning m…☆18Nov 28, 2024Updated last year
- ☆12Aug 6, 2024Updated last year