prometheus-eval / prometheusLinks
[ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on a customized score rubric, Prometheus is a good alternative for human evaluation and GPT-4 evaluation.
☆299Updated last year
Alternatives and similar repositories for prometheus
Users that are interested in prometheus are comparing it to the libraries listed below
Sorting:
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets☆217Updated last year
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆243Updated last year
- This is the repository for our paper "INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning"☆205Updated 6 months ago
- Official repository for ORPO☆455Updated last year
- ☆520Updated 7 months ago
- Benchmarking library for RAG☆209Updated 2 weeks ago
- Fast & more realistic evaluation of chat language models. Includes leaderboard.☆187Updated last year
- Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.☆162Updated last year
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆185Updated 6 months ago
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆268Updated last year
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆219Updated last year
- ☆283Updated last year
- Generative Representational Instruction Tuning☆654Updated 3 months ago
- [EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627☆490Updated 8 months ago
- Inquisitive Parrots for Search☆193Updated 3 weeks ago
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆537Updated last year
- Scalable training for dense retrieval models.☆298Updated 2 weeks ago
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆498Updated 5 months ago
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆472Updated this week
- Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"☆353Updated last year
- Reverse Instructions to generate instruction tuning data with corpus examples☆213Updated last year
- ☆150Updated last year
- LOFT: A 1 Million+ Token Long-Context Benchmark☆202Updated last week
- [Preprint] Learning to Filter Context for Retrieval-Augmented Generaton☆193Updated last year
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆546Updated last year
- This repository contains code for extending the Stanford Alpaca synthetic instruction tuning to existing instruction-tuned models such as…☆352Updated last year
- Source code for the paper "Active Prompting with Chain-of-Thought for Large Language Models"☆241Updated last year
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆239Updated 4 months ago
- Forward-Looking Active REtrieval-augmented generation (FLARE)☆636Updated last year
- Source Code of Paper "GPTScore: Evaluate as You Desire"☆251Updated 2 years ago