mjbommar / gpt4-passes-the-bar
GPT-4 Passes the Bar
☆21Updated 9 months ago
Related projects: ⓘ
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆97Updated 2 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆76Updated 6 months ago
- Pretraining Efficiently on S2ORC!☆135Updated last year
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆109Updated last week
- Attribute (or cite) statements generated by LLMs back to in-context information.☆107Updated 2 weeks ago
- ☆73Updated last year
- Retrieval Augmented Generation Generalized Evaluation Dataset☆51Updated this week
- ☆38Updated 5 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆106Updated 10 months ago
- ☆91Updated 5 months ago
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆105Updated last year
- A repository for transformer critique learning and generation☆84Updated 9 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆96Updated 10 months ago
- The codebase for our ACL2023 paper: Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learni…☆26Updated last year
- Public Inflection Benchmarks☆69Updated 6 months ago
- Codebase accompanying the Summary of a Haystack paper.☆65Updated 2 months ago
- ☆47Updated 3 weeks ago
- AuditNLG: Auditing Generative AI Language Modeling for Trustworthiness☆94Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆118Updated 6 months ago
- Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkI☆90Updated last year
- SILO Language Models code repository☆80Updated 6 months ago
- ☆81Updated 3 months ago
- This repository provides scripts for evaluating NLP models on the LEXTREME benchmark, a set of diverse multilingual tasks in legal NLP☆19Updated 8 months ago
- For experiments involving instruct gpt. Currently used for documenting open research questions.☆71Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆59Updated 10 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆77Updated last month
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆143Updated 2 months ago
- We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in …☆42Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆60Updated last year
- ☆94Updated last year