open-evals / evalsLinks
Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks.
☆19Updated 2 years ago
Alternatives and similar repositories for evals
Users that are interested in evals are comparing it to the libraries listed below
Sorting:
- ☆17Updated 2 months ago
- [EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.☆74Updated last year
- This repo is based on https://github.com/jiaweizzhao/GaLore☆29Updated 9 months ago
- Code repository for the c-BTM paper☆106Updated last year
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs☆48Updated 11 months ago
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆81Updated last year
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆35Updated last year
- Repository for Skill Set Optimization☆13Updated 11 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- ☆50Updated 3 weeks ago
- ☆14Updated 3 years ago
- ROUGE score calculator with traditional chinese word segmentation☆9Updated 4 years ago
- ☆35Updated last year
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆23Updated 7 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆81Updated 10 months ago
- Transformers at any scale☆41Updated last year
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆28Updated 11 months ago
- Open-Source LLM Coders with Co-Evolving Reinforcement Learning☆87Updated 3 weeks ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆55Updated this week
- An unofficial implementation of the Infini-gram model proposed by Liu et al. (2024)☆33Updated last year
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆23Updated last month
- ☆22Updated this week
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆47Updated last year
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆30Updated this week
- ☆27Updated this week
- Codebase for Instruction Following without Instruction Tuning☆34Updated 9 months ago