blcuicall / OMGEvalLinks
OMGEval๐ฎ: An Open Multilingual Generative Evaluation Benchmark for Foundation Models
โ36Updated last year
Alternatives and similar repositories for OMGEval
Users that are interested in OMGEval are comparing it to the libraries listed below
Sorting:
- EMNLP'2023: Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Explorationโ36Updated last year
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Modelsโ119Updated 7 months ago
- โ58Updated last year
- โ28Updated 3 years ago
- โ90Updated last year
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)โ50Updated last year
- Official Implementation of "Probing Language Models for Pre-training Data Detection"โ20Updated last year
- A retrieval augmented sequence modeling toolkit implemented based on Fairseqโ29Updated 2 years ago
- ๐ฉบ A collection of ChatGPT evaluation reports on various bechmarks.โ50Updated 2 years ago
- Collection of papers for scalable automated alignment.โ93Updated last year
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"โ136Updated last year
- an easy-to-use knn-mt toolkitโ106Updated 2 years ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimationโ90Updated last year
- Code & Data for our Paper "RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation" (EMNLP 2023)โ17Updated 2 years ago
- EMNLP'2024: Knowledge Verification to Nip Hallucination in the Budโ23Updated last year
- CFBench: A Comprehensive Constraints-Following Benchmark for LLMsโ47Updated last year
- โ147Updated last year
- The repository for paper <Evaluating Open-QA Evaluation>โ25Updated last year
- ๐ An unofficial implementation of Self-Alignment with Instruction Backtranslation.โ137Updated 8 months ago
- Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation (EMNLP 2023)โ31Updated 3 months ago
- ACL2023 (Oral): TemplateGEC: Improving Grammatical Error Correction with Detection Templateโ22Updated 2 years ago
- CDQA: Chinese Dynamic Question Answering Benchmarkโ17Updated last year
- โ27Updated 2 years ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)โ101Updated 11 months ago
- [ChatGPT4MTevaluation] ErrorAnalysis Prompt for MT Evaluation in ChatGPTโ91Updated 3 months ago
- [ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Modelsโ58Updated last year
- โ87Updated 2 years ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialoguesโ138Updated last year
- โ32Updated 2 years ago
- Repo for ACL2023 paper "Won't Get Fooled Again: Answering Questions with False Premises"โ22Updated 2 years ago