nlp-uoregon / mlmm-evaluation
Multilingual Large Language Models Evaluation Benchmark
☆105Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for mlmm-evaluation
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback☆91Updated last year
- ☆166Updated last year
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆96Updated 6 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 7 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆114Updated last month
- A Multilingual Replicable Instruction-Following Model☆93Updated last year
- Code, datasets, and checkpoints for the paper "Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP 2022)"☆93Updated last year
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆86Updated last year
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆188Updated 2 months ago
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation☆193Updated 8 months ago
- Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".☆71Updated 2 weeks ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆78Updated 2 months ago
- A Survey on Data Selection for Language Models☆178Updated 3 weeks ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision☆78Updated last week
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆110Updated 7 months ago
- ☆120Updated 2 months ago
- ☆66Updated 9 months ago
- ☆24Updated 4 months ago
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆89Updated last month
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆65Updated 8 months ago
- Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.☆159Updated last year
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆54Updated 10 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆84Updated 3 months ago
- GEMBA — GPT Estimation Metric Based Assessment☆100Updated 3 months ago
- Codebase, data and models for the SummaC paper in TACL☆85Updated 10 months ago
- A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper…☆105Updated last month
- ☆218Updated 5 months ago
- Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.☆177Updated 2 years ago
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆65Updated 3 years ago
- Source Code of Paper "GPTScore: Evaluate as You Desire"☆230Updated last year