Open-source library for scalable, reproducible evaluation of AI models and benchmarks.
☆206Feb 28, 2026Updated this week
Alternatives and similar repositories for Evaluator
Users that are interested in Evaluator are comparing it to the libraries listed below
Sorting:
- StrategyQA 데이터 세트 번역☆23Apr 12, 2024Updated last year
- ☆10Jun 5, 2025Updated 8 months ago
- BERT score for text generation☆12Jan 15, 2025Updated last year
- KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models☆25Aug 24, 2024Updated last year
- Bias, Hate classification with KoELECTRA 👿☆27Jun 12, 2023Updated 2 years ago
- Reproducible and flexible LLM evaluations for scientific reasoning.☆26Jul 23, 2025Updated 7 months ago
- 🤗 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.☆17Jun 5, 2025Updated 8 months ago
- Official repository for K-EXAONE built by LG AI Research☆69Feb 6, 2026Updated 3 weeks ago
- AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solu…☆150Updated this week
- Rust crate for some audio utilities☆27Mar 8, 2025Updated 11 months ago
- Evaluate gpt-4o on CLIcK (Korean NLP Dataset)☆20May 18, 2024Updated last year
- REverse-Engineered Reasoning for Open-Ended Generation☆93Sep 10, 2025Updated 5 months ago
- KURE: 고려대학교에서 개발한, 한국어 검색에 특화된 임베딩 모델☆206Feb 20, 2026Updated last week
- [KO-Platy🥮] Korean-Open-platypus를 활용하여 llama-2-ko를 fine-tuning한 KO-platypus model☆73Aug 24, 2025Updated 6 months ago
- ☆31Sep 12, 2025Updated 5 months ago
- A small rust-based data loader☆36Feb 20, 2026Updated last week
- Process Orchestration Framework: A camunda 7 fork☆21Feb 23, 2026Updated last week
- Evaluating Multimodal Generative AI with Korean Educational Standards, NAACL 2025.☆25May 15, 2025Updated 9 months ago
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆38Aug 4, 2025Updated 6 months ago
- SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving☆57Jan 13, 2026Updated last month
- Verifiers for LLM Reinforcement Learning☆80Apr 15, 2025Updated 10 months ago
- This is a hands-on for ML beginners to perform SimCSE step-by-step. Implemented both supervised SimCSE and unsupervisied SimCSE, and dist…☆22Oct 6, 2023Updated 2 years ago
- This repository contains data, code and models for contextual noncompliance.☆25Jul 18, 2024Updated last year
- The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as…☆18Sep 17, 2025Updated 5 months ago
- [ACL25] FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation☆44Jan 28, 2026Updated last month
- ☆109Dec 10, 2025Updated 2 months ago
- KoCLIP: Korean port of OpenAI CLIP, in Flax☆154Dec 28, 2025Updated 2 months ago
- ☆39Feb 7, 2025Updated last year
- 🌪️ AI research assistant that generates Wikipedia-quality articles through multi-perspective analysis. Based on Stanford's STORM methodo…☆50Jun 6, 2025Updated 8 months ago
- [ICLR 2026] JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence☆76Feb 9, 2026Updated 3 weeks ago
- The most modern LLM evaluation toolkit☆70Nov 9, 2025Updated 3 months ago
- SDLC Copilot is an Agentic AI system designed to streamline and automate the Software Development Lifecycle (SDLC). From requirement gath…☆23Jun 14, 2025Updated 8 months ago
- Benchmark in Korean Context☆138Sep 26, 2023Updated 2 years ago
- hwpxlib 패키지 python에서 쉽게 사용 할수 있게 만든 github repo 입니다.☆36Mar 29, 2025Updated 11 months ago
- ☆10Apr 26, 2023Updated 2 years ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆94Nov 17, 2024Updated last year
- Clinical NLP Shared Task @ NAACL'24☆42Aug 20, 2025Updated 6 months ago
- A Grand Sumo prediction game☆10Feb 24, 2026Updated last week
- Detect-Then-Explain Framework for Text-to-SQL task☆10Dec 6, 2023Updated 2 years ago