IBM / ensemble-instructLinks
codebase release for EMNLP2023 paper publication
β19Updated 4 months ago
Alternatives and similar repositories for ensemble-instruct
Users that are interested in ensemble-instruct are comparing it to the libraries listed below
Sorting:
- π¦ Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data β¦β212Updated 2 weeks ago
- β43Updated last year
- β38Updated last year
- Small and Efficient Mathematical Reasoning LLMsβ73Updated 2 years ago
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."β66Updated 2 years ago
- Functional Benchmarks and the Reasoning Gapβ89Updated last year
- Retrieval Augmented Generation Generalized Evaluation Datasetβ61Updated 6 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"β107Updated 2 years ago
- Synthetic Data Generation for Evaluationβ13Updated 11 months ago
- Evaluating LLMs with CommonGen-Liteβ93Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersβ136Updated last year
- Code accompanying "How I learned to start worrying about prompt formatting".β115Updated 7 months ago
- Datasets collection and preprocessings framework for NLP extreme multitask learningβ192Updated 6 months ago
- β17Updated 10 months ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found heβ¦β31Updated 2 years ago
- Aioli: A unified optimization framework for language model data mixingβ32Updated last year
- Minimum Bayes Risk Decoding for Hugging Face Transformersβ60Updated last year
- β23Updated 2 years ago
- A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.β61Updated 7 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"β86Updated last year
- A package dedicated for running benchmark agreement testingβ17Updated 4 months ago
- β59Updated last year
- π’ Data Toolkit for Sailor Language Modelsβ95Updated 11 months ago
- Code for "Democratizing Reasoning Ability: Tailored Learning from Large Language Model", EMNLP 2023β36Updated 2 years ago
- β56Updated last year
- β130Updated last year
- Open Implementations of LLM Analysesβ107Updated last year
- Advanced Reasoning Benchmark Dataset for LLMsβ47Updated 2 years ago
- Code for NeurIPS LLM Efficiency Challengeβ60Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ61Updated last year