Open-source library for scalable, reproducible evaluation of AI models and benchmarks.
☆253Apr 10, 2026Updated this week
Alternatives and similar repositories for Evaluator
Users that are interested in Evaluator are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Jun 5, 2025Updated 10 months ago
- [EMNLP 2025] The official implementation of "Zero-shot Multimodal Document Retrieval via Cross-Modal Question Generation"☆15Aug 26, 2025Updated 7 months ago
- BERT score for text generation☆12Jan 15, 2025Updated last year
- 🔭 interactively explore `onnx` networks in your CLI.☆26Jun 7, 2024Updated last year
- 🤗 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.☆17Jun 5, 2025Updated 10 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [KO-Platy🥮] Korean-Open-platypus를 활용하여 llama-2-ko를 fine-tuning한 KO-platypus model☆73Aug 24, 2025Updated 7 months ago
- [COLING'25] Gen-SQL: Efficient Text-to-SQL By Bridging Natural Language Question And Database Schema With Pseudo-Schema☆24Jul 9, 2025Updated 9 months ago
- KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models☆25Aug 24, 2024Updated last year
- A mechanistic approach for understanding and detecting factual errors of large language models.☆49Jul 6, 2024Updated last year
- ☆53Feb 11, 2025Updated last year
- This repository contains data, code and models for contextual noncompliance.☆25Jul 18, 2024Updated last year
- A Finance Dataset Benchmark for Natural Language Queries☆26Dec 7, 2020Updated 5 years ago
- 青空文庫のテキストファイル☆14Feb 4, 2024Updated 2 years ago
- Build RL environments for LLM training☆812Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- KURE: 고려대학교에서 개발한, 한국어 검색에 특화된 임베딩 모델☆209Apr 4, 2026Updated last week
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆182Feb 26, 2026Updated last month
- Evaluating Multimodal Generative AI with Korean Educational Standards, NAACL 2025.☆26May 15, 2025Updated 10 months ago
- Rust crate for some audio utilities☆27Mar 8, 2025Updated last year
- Programatically edit the W&B UI☆22Mar 31, 2026Updated last week
- Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"☆12Mar 25, 2025Updated last year
- Code for the paper "Closing the Curious Case of Neural Text Degeneration"☆12Apr 9, 2025Updated last year
- 한국어 심리 상담 데이터셋☆81Jun 20, 2023Updated 2 years ago
- Bias, Hate classification with KoELECTRA 👿☆27Jun 12, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Reproducible and flexible LLM evaluations for scientific reasoning.☆27Jul 23, 2025Updated 8 months ago
- Private Preview: Responsible AI Tooling in Azure Machine Learning☆18Mar 28, 2022Updated 4 years ago
- ☆31Sep 12, 2025Updated 7 months ago
- Comprehensive LLM evaluation at scale: A production-ready framework for evaluating large language models across multiple benchmarks.☆38Updated this week
- Miscellaneous codes and writings for MLOps☆15Updated this week
- Benchmark in Korean Context☆138Sep 26, 2023Updated 2 years ago
- The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as…☆19Sep 17, 2025Updated 6 months ago
- KoCLIP: Korean port of OpenAI CLIP, in Flax☆155Dec 28, 2025Updated 3 months ago
- #인권코퍼스☆31Oct 6, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- SLM-SQL: An Exploration of Small Language Models for Text-to-SQL☆32Aug 12, 2025Updated 8 months ago
- ☆123Apr 3, 2026Updated last week
- 한국어 언어모델 오픈소스☆83May 4, 2023Updated 2 years ago
- 🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.☆1,580Updated this week
- A recipe for constituency parsing, disfluency tagging and obtaining the fluent transcripts of English Fisher dataset☆13May 2, 2021Updated 4 years ago
- Test LLMs against jailbreaks and unprecedented harms☆39Oct 19, 2024Updated last year
- REverse-Engineered Reasoning for Open-Ended Generation☆94Sep 10, 2025Updated 7 months ago