huggingface / lm-evaluation-harness
A framework for few-shot evaluation of language models.
β17Updated last week
Related projects β
Alternatives and complementary repositories for lm-evaluation-harness
- Lightweight demos for finetuning LLMs. Powered by π€ transformers and open-source datasets.β64Updated 3 weeks ago
- Codebase accompanying the Summary of a Haystack paper.β72Updated last month
- Official implementation for 'Extending LLMsβ Context Window with 100 Samples'β73Updated 9 months ago
- β73Updated 10 months ago
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.β49Updated last week
- β37Updated last year
- β92Updated last month
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]β124Updated 2 weeks ago
- β43Updated 3 months ago
- Small and Efficient Mathematical Reasoning LLMsβ71Updated 9 months ago
- Retrieval Augmented Generation Generalized Evaluation Datasetβ51Updated last month
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ40Updated 8 months ago
- Code for NeurIPS LLM Efficiency Challengeβ53Updated 7 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersβ122Updated 7 months ago
- β44Updated 2 months ago
- Retrieval-Augmented Generation battle!β44Updated last month
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messagesβ36Updated last month
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsβ22Updated 8 months ago
- β111Updated last month
- Benchmarking library for RAGβ113Updated this week
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found heβ¦β31Updated last year
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)β114Updated this week
- β56Updated 8 months ago
- Datasets collection and preprocessings framework for NLP extreme multitask learningβ149Updated 4 months ago
- Sakura-SOLAR-DPO: Merge, SFT, and DPOβ115Updated 10 months ago
- Scalable Meta-Evaluation of LLMs as Evaluatorsβ41Updated 8 months ago
- π’ Data Toolkit for Sailor Language Modelsβ81Updated 4 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.β92Updated last year
- Finetune mistral-7b-instruct for sentence embeddingsβ70Updated 6 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Searchβ61Updated 4 months ago