qcri / LLMeBench
Benchmarking Large Language Models
☆80Updated last month
Related projects ⓘ
Alternatives and complementary repositories for LLMeBench
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 7 months ago
- ☆34Updated 3 months ago
- A framework for few-shot evaluation of autoregressive language models.☆101Updated last year
- Finetune mistral-7b-instruct for sentence embeddings☆70Updated 6 months ago
- Resources for cultural NLP research☆61Updated last week
- Token-level Reference-free Hallucination Detection☆92Updated last year
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆65Updated 8 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆96Updated 6 months ago
- Pretraining Efficiently on S2ORC!☆136Updated 3 weeks ago
- A Multilingual Replicable Instruction-Following Model☆93Updated last year
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆75Updated 7 months ago
- A multi-purpose toolkit for table-to-text generation: web interface, Python bindings, CLI commands.☆54Updated 6 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆78Updated 3 months ago
- ☆73Updated last year
- GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings☆36Updated 8 months ago
- Multilingual Large Language Models Evaluation Benchmark☆105Updated 2 months ago
- ☆55Updated last year
- Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".☆71Updated 2 weeks ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆156Updated 6 months ago
- Train Llama 2 & 3 on the SQuAD v2 task as an example of how to specialize a generalized (foundation) model.☆47Updated 5 months ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆188Updated 2 months ago
- Code and dataset for the emnlp paper titled Instruct and Extract: Instruction Tuning for On-Demand Information Extraction☆49Updated 10 months ago
- ☆29Updated 9 months ago
- Codebase accompanying the Summary of a Haystack paper.☆72Updated last month
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆149Updated 4 months ago
- Repo for "On Learning to Summarize with Large Language Models as References"☆42Updated last year
- ☆38Updated 6 months ago
- Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.☆177Updated 2 years ago
- ☆34Updated 5 months ago
- This project studies the performance and robustness of language models and task-adaptation methods.☆141Updated 5 months ago