open-evals / evals
Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks.
☆19Updated 2 years ago
Alternatives and similar repositories for evals:
Users that are interested in evals are comparing it to the libraries listed below
- [EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.☆72Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Repository for Skill Set Optimization☆12Updated 9 months ago
- Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI☆57Updated last year
- The "GPT-API-Accelerate" project provides a set of Python classes for accelerating the process of generating responses to prompts using t…☆23Updated 6 months ago
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆23Updated last week
- ☆25Updated last year
- Transformers at any scale☆41Updated last year
- SILO Language Models code repository☆81Updated last year
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- A file utility for accessing both local and remote files through a unified interface.☆41Updated 3 weeks ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆27Updated 7 months ago
- Finding semantically meaningful and accurate prompts.☆46Updated last year
- ☆37Updated this week
- My explorations into editing the knowledge and memories of an attention network☆34Updated 2 years ago
- ZYN: Zero-Shot Reward Models with Yes-No Questions☆33Updated last year
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆69Updated 2 years ago
- Tasks for describing differences between text distributions.☆16Updated 9 months ago
- Code for the note "NF4 Isn't Information Theoretically Optimal (and that's Good)☆18Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆46Updated last year
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆34Updated last year
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆42Updated 5 months ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆36Updated last year
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆31Updated last year
- ☆15Updated 10 months ago
- WorldSense benchmark for grounded reasoning in language models☆18Updated last year
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- ☆27Updated last year
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- Official Repository for Task-Circuit Quantization☆19Updated last week