qcri / LLMeBench
Benchmarking Large Language Models
☆94Updated last week
Alternatives and similar repositories for LLMeBench:
Users that are interested in LLMeBench are comparing it to the libraries listed below
- Multilingual Large Language Models Evaluation Benchmark☆119Updated 7 months ago
- A Multilingual Replicable Instruction-Following Model☆93Updated last year
- ☆41Updated 2 months ago
- Resources for cultural NLP research☆86Updated 2 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆126Updated last year
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆100Updated 11 months ago
- Token-level Reference-free Hallucination Detection☆94Updated last year
- ☆43Updated 9 months ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆214Updated 4 months ago
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆68Updated last year
- Tools for managing datasets for governance and training.☆83Updated last month
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆166Updated 3 months ago
- Finetune mistral-7b-instruct for sentence embeddings☆81Updated 10 months ago
- Codebase accompanying the Summary of a Haystack paper.☆76Updated 6 months ago
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆73Updated 5 months ago
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback☆94Updated last year
- A curated list of research papers and resources on Cultural LLM.☆41Updated 6 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆83Updated 7 months ago
- This project studies the performance and robustness of language models and task-adaptation methods.☆149Updated 10 months ago
- A framework for few-shot evaluation of autoregressive language models.☆103Updated last year
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆71Updated last year
- Open Implementations of LLM Analyses☆102Updated 5 months ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆124Updated last year
- A multi-purpose toolkit for table-to-text generation: web interface, Python bindings, CLI commands.☆55Updated 11 months ago
- Repository for the EMNLP 2024 conference☆8Updated 4 months ago
- Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".☆76Updated 3 weeks ago
- This repository provides scripts for evaluating NLP models on the LEXTREME benchmark, a set of diverse multilingual tasks in legal NLP☆21Updated last year
- ☆38Updated 11 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆95Updated last year
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆56Updated last year