qcri / LLMeBench
Benchmarking Large Language Models
☆96Updated last month
Alternatives and similar repositories for LLMeBench
Users that are interested in LLMeBench are comparing it to the libraries listed below
Sorting:
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆128Updated last year
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆68Updated last year
- Multilingual Large Language Models Evaluation Benchmark☆123Updated 8 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆100Updated last year
- Resources for cultural NLP research☆95Updated 3 weeks ago
- Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".☆80Updated 2 months ago
- Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.☆98Updated last year
- ☆38Updated 2 years ago
- ☆71Updated 7 months ago
- First explanation metric (diagnostic report) for text generation evaluation☆61Updated 2 months ago
- ☆45Updated 9 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆171Updated 5 months ago
- Interpreting Language Models with Contrastive Explanations (EMNLP 2022 Best Paper Honorable Mention)☆62Updated 3 years ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆127Updated last year
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆56Updated last year
- A Multilingual Replicable Instruction-Following Model☆93Updated last year
- A multi-purpose toolkit for table-to-text generation: web interface, Python bindings, CLI commands.☆55Updated last year
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆71Updated last year
- A curated list of research papers and resources on Cultural LLM.☆42Updated 7 months ago
- ☆47Updated 11 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 7 months ago
- Token-level Reference-free Hallucination Detection☆94Updated last year
- ☆147Updated last year
- ☆97Updated 2 years ago
- ☆40Updated 3 months ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 3 months ago
- Retrieval Augmented Generation Generalized Evaluation Dataset☆53Updated 5 months ago
- Apps built using Inspired Cognition's Critique.☆58Updated 2 years ago
- GEMBA — GPT Estimation Metric Based Assessment☆118Updated 9 months ago
- Repository for research in the field of Responsible NLP at Meta.☆199Updated 5 months ago