Open-source library for scalable, reproducible evaluation of AI models and benchmarks.
☆301Jun 11, 2026Updated this week
Alternatives and similar repositories for Evaluator
Users that are interested in Evaluator are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [EMNLP 2025] The official implementation of "Zero-shot Multimodal Document Retrieval via Cross-Modal Question Generation"☆15Aug 26, 2025Updated 9 months ago
- Navigator Helpers☆11Nov 7, 2024Updated last year
- BERT score for text generation☆12Jan 15, 2025Updated last year
- 🤗 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.☆17Jun 5, 2025Updated last year
- 🔭 interactively explore `onnx` networks in your CLI.☆27Jun 7, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [KO-Platy🥮] Korean-Open-platypus를 활용하여 llama-2-ko를 fine-tuning한 KO-platypus model☆73Aug 24, 2025Updated 9 months ago
- [COLING'25] Gen-SQL: Efficient Text-to-SQL By Bridging Natural Language Question And Database Schema With Pseudo-Schema☆23Jul 9, 2025Updated 11 months ago
- KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models☆25Aug 24, 2024Updated last year
- ☆55Feb 11, 2025Updated last year
- This repository contains data, code and models for contextual noncompliance.☆26Jul 18, 2024Updated last year
- StrategyQA 데이터 세트 번역☆22Apr 12, 2024Updated 2 years ago
- KURE: 고려대학교에서 개발한, 한국어 검색에 특화된 임베딩 모델☆221Apr 14, 2026Updated 2 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆183Apr 15, 2026Updated 2 months ago
- Evaluate and improve models and agents using environments☆981Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Evaluating Multimodal Generative AI with Korean Educational Standards, NAACL 2025.☆26May 15, 2025Updated last year
- Rust crate for some audio utilities☆28Mar 8, 2025Updated last year
- Code for the paper "Closing the Curious Case of Neural Text Degeneration"☆12Apr 9, 2025Updated last year
- 한국어 심리 상담 데이터셋☆81Jun 20, 2023Updated 2 years ago
- Bias, Hate classification with KoELECTRA 👿☆27Jun 12, 2023Updated 3 years ago
- Reproducible and flexible LLM evaluations for scientific reasoning.☆28Jul 23, 2025Updated 10 months ago
- Private Preview: Responsible AI Tooling in Azure Machine Learning☆18Mar 28, 2022Updated 4 years ago
- CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean☆48Dec 23, 2024Updated last year
- ☆31Sep 12, 2025Updated 9 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A small rust-based data loader☆37Feb 20, 2026Updated 3 months ago
- Miscellaneous codes and writings for MLOps☆15Apr 8, 2026Updated 2 months ago
- KoCLIP: Korean port of OpenAI CLIP, in Flax☆155Dec 28, 2025Updated 5 months ago
- FairCVtest: Testbed for Fair Automatic Recruitment and Multimodal Bias Analysis☆21Jul 24, 2023Updated 2 years ago
- #인권코퍼스☆31Oct 6, 2023Updated 2 years ago
- SLM-SQL: An Exploration of Small Language Models for Text-to-SQL☆35Aug 12, 2025Updated 10 months ago
- Comprehensive LLM evaluation at scale: A production-ready framework for evaluating large language models across multiple benchmarks.☆41Updated this week
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback☆12Jul 13, 2022Updated 3 years ago
- Test LLMs against jailbreaks and unprecedented harms☆40Oct 19, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- REverse-Engineered Reasoning for Open-Ended Generation☆97Sep 10, 2025Updated 9 months ago
- This is a hands-on for ML beginners to perform SimCSE step-by-step. Implemented both supervised SimCSE and unsupervisied SimCSE, and dist…☆22Oct 6, 2023Updated 2 years ago
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆19Jun 11, 2025Updated last year
- An unofficial implementation of SOLAR-10.7B model and the newly proposed interlocked-DUS(iDUS) implementation and experiment details.☆14Mar 20, 2024Updated 2 years ago
- A tool to configure, launch and manage your machine learning experiments.☆247Updated this week
- Reward Model을 이용하여 언어모델의 답변을 평가하기☆30Feb 23, 2024Updated 2 years ago
- Text generation using language models with multiple exit heads☆16Sep 18, 2025Updated 8 months ago