qcri / LLMeBenchLinks
Benchmarking Large Language Models
☆99Updated this week
Alternatives and similar repositories for LLMeBench
Users that are interested in LLMeBench are comparing it to the libraries listed below
Sorting:
- ☆41Updated 5 months ago
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆69Updated last year
- Multilingual Large Language Models Evaluation Benchmark☆124Updated 10 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆129Updated last year
- A Multilingual Replicable Instruction-Following Model☆93Updated 2 years ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆101Updated last year
- Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".☆81Updated this week
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆185Updated 6 months ago
- ☆51Updated last year
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆84Updated 10 months ago
- Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…☆40Updated 8 months ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆128Updated last year
- Resources for cultural NLP research☆97Updated 2 months ago
- Fine-tuning Open-Source LLMs for Adaptive Machine Translation☆79Updated last month
- A dataset focused on summarization of dialogs, which represents the rich domain of Twitter customer care conversations☆32Updated last year
- Dataset from the paper "Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering" (COLING 2022)☆114Updated 2 years ago
- Codebase, data and models for the SummaC paper in TACL☆96Updated 4 months ago
- The FLORES+ Machine Translation Benchmark☆105Updated 7 months ago
- A framework for few-shot evaluation of autoregressive language models.☆104Updated 2 years ago
- ☆39Updated 2 years ago
- A curated list of research papers and resources on Cultural LLM.☆44Updated 8 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆48Updated last year
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆90Updated last year
- ☆71Updated last year
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆56Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆59Updated 10 months ago
- ☆27Updated 3 weeks ago
- ☆76Updated 8 months ago
- ☆22Updated 3 years ago
- SeeGULL is a broad-coverage stereotype dataset in English containing stereotypes about identity groups spanning 178 countries across 8 di…☆35Updated last year