mjbommar / gpt4-passes-the-bar
GPT-4 Passes the Bar
☆26Updated last year
Alternatives and similar repositories for gpt4-passes-the-bar:
Users that are interested in gpt4-passes-the-bar are comparing it to the libraries listed below
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agents☆24Updated 2 years ago
- A dataset for pretraining language models targeted for legal tasks.☆131Updated 2 years ago
- Ludwig benchmark☆20Updated 3 years ago
- A set of utilities for running few-shot prompting experiments on large-language models☆118Updated last year
- ☆40Updated 2 months ago
- A dataset of alignment research and code to reproduce it☆77Updated last year
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆106Updated 5 months ago
- Code for constructing TLDR corpus from Reddit dataset☆27Updated 3 years ago
- Factored Cognition Primer: How to write compositional language model programs☆48Updated 2 years ago
- Repository for Zheng and Guha et al., 2021, "When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Data…☆87Updated 2 years ago
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆114Updated 7 months ago
- Source codes for the paper "Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints"☆28Updated 2 years ago
- ☆93Updated 4 months ago
- For experiments involving instruct gpt. Currently used for documenting open research questions.☆71Updated 2 years ago
- ☆15Updated 2 weeks ago
- ☆17Updated last year
- ☆25Updated 3 weeks ago
- ☆178Updated 2 years ago
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Updated last year
- This repository serves as a collection of scrapers procuring and structuring various legal datasets☆17Updated last year
- The data and implementation for the experiments in the paper "Flows: Building Blocks of Reasoning and Collaborating AI".☆31Updated last year
- Retrieval Augmented Generation Generalized Evaluation Dataset☆53Updated 5 months ago
- Large-language Model Evaluation framework with Elo Leaderboard and A-B testing☆52Updated 6 months ago
- ☆53Updated 4 months ago
- ☆93Updated 11 months ago
- Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs. EMNLP 2024☆20Updated 5 months ago
- ☆51Updated last year
- Pretraining Efficiently on S2ORC!☆161Updated 6 months ago
- The codebase for our ACL2023 paper: Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learni…☆29Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated 8 months ago