Open-source library for scalable, reproducible evaluation of AI models and benchmarks.
☆271Apr 30, 2026Updated this week
Alternatives and similar repositories for Evaluator
Users that are interested in Evaluator are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Jun 5, 2025Updated 10 months ago
- AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solu…☆253Updated this week
- [EMNLP 2025] The official implementation of "Zero-shot Multimodal Document Retrieval via Cross-Modal Question Generation"☆15Aug 26, 2025Updated 8 months ago
- 🔭 interactively explore `onnx` networks in your CLI.☆26Jun 7, 2024Updated last year
- 🤗 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.☆17Jun 5, 2025Updated 10 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [KO-Platy🥮] Korean-Open-platypus를 활용하여 llama-2-ko를 fine-tuning한 KO-platypus model☆73Aug 24, 2025Updated 8 months ago
- [COLING'25] Gen-SQL: Efficient Text-to-SQL By Bridging Natural Language Question And Database Schema With Pseudo-Schema☆24Jul 9, 2025Updated 9 months ago
- KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models☆25Aug 24, 2024Updated last year
- YetAnotherWandbClient☆13Mar 16, 2026Updated last month
- This repository contains data, code and models for contextual noncompliance.☆25Jul 18, 2024Updated last year
- StrategyQA 데이터 세트 번역☆22Apr 12, 2024Updated 2 years ago
- A Finance Dataset Benchmark for Natural Language Queries☆26Dec 7, 2020Updated 5 years ago
- KURE: 고려대학교에서 개발한, 한국어 검색에 특화된 임베딩 모델☆211Apr 14, 2026Updated 2 weeks ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆183Apr 15, 2026Updated 2 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Evaluating Multimodal Generative AI with Korean Educational Standards, NAACL 2025.☆26May 15, 2025Updated 11 months ago
- Rust crate for some audio utilities☆28Mar 8, 2025Updated last year
- Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"☆12Mar 25, 2025Updated last year
- Make nice plots with matplotlib.☆11Oct 8, 2019Updated 6 years ago
- 🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.☆1,746Updated this week
- 한국어 심리 상담 데이터셋☆80Jun 20, 2023Updated 2 years ago
- Bias, Hate classification with KoELECTRA 👿☆27Jun 12, 2023Updated 2 years ago
- Private Preview: Responsible AI Tooling in Azure Machine Learning☆18Mar 28, 2022Updated 4 years ago
- ☆31Sep 12, 2025Updated 7 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A small rust-based data loader☆37Feb 20, 2026Updated 2 months ago
- Miscellaneous codes and writings for MLOps☆15Apr 8, 2026Updated 3 weeks ago
- Benchmark in Korean Context☆137Sep 26, 2023Updated 2 years ago
- KoCLIP: Korean port of OpenAI CLIP, in Flax☆155Dec 28, 2025Updated 4 months ago
- #인권코퍼스☆31Oct 6, 2023Updated 2 years ago
- SLM-SQL: An Exploration of Small Language Models for Text-to-SQL☆32Aug 12, 2025Updated 8 months ago
- ☆126Apr 26, 2026Updated last week
- Comprehensive LLM evaluation at scale: A production-ready framework for evaluating large language models across multiple benchmarks.☆41Updated this week
- 한국어 언어모델 오픈소스☆83May 4, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A recipe for constituency parsing, disfluency tagging and obtaining the fluent transcripts of English Fisher dataset☆13May 2, 2021Updated 5 years ago
- Test LLMs against jailbreaks and unprecedented harms☆39Oct 19, 2024Updated last year
- REverse-Engineered Reasoning for Open-Ended Generation☆95Sep 10, 2025Updated 7 months ago
- This is a hands-on for ML beginners to perform SimCSE step-by-step. Implemented both supervised SimCSE and unsupervisied SimCSE, and dist…☆22Oct 6, 2023Updated 2 years ago
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆19Jun 11, 2025Updated 10 months ago
- An unofficial implementation of SOLAR-10.7B model and the newly proposed interlocked-DUS(iDUS) implementation and experiment details.☆14Mar 20, 2024Updated 2 years ago
- A tool to configure, launch and manage your machine learning experiments.☆241Updated this week