bamman-group / gpt4-books
Code and data to support "Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4"
☆69Updated last year
Alternatives and similar repositories for gpt4-books:
Users that are interested in gpt4-books are comparing it to the libraries listed below
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆31Updated last year
- Code for our EMNLP '22 paper "Fixing Model Bugs with Natural Language Patches"☆19Updated 2 years ago
- https://footprints.baulab.info☆17Updated 5 months ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆21Updated last year
- Learning to route instances for Human vs AI Feedback☆21Updated last month
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"☆28Updated 2 years ago
- Documentation effort for the BookCorpus dataset☆34Updated 3 years ago
- ☆21Updated 2 months ago
- Finding semantically meaningful and accurate prompts.☆46Updated last year
- ☆29Updated last year
- Few-shot Learning with Auxiliary Data☆27Updated last year
- An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.☆32Updated 2 years ago
- Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)☆22Updated last year
- An unofficial implementation of the Infini-gram model proposed by Liu et al. (2024)☆30Updated 9 months ago
- ☆11Updated 2 years ago
- Small python package to measure OCR quality and other related metrics.☆21Updated last year
- Are foundation LMs multilingual knowledge bases? (EMNLP 2023)☆19Updated last year
- Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)☆61Updated last year
- Code for SaGe subword tokenizer (EACL 2023)☆24Updated 4 months ago
- The official repository for Toxic Commons and Celadon. Toxicity Classification for public domain data.☆14Updated 4 months ago
- ☆65Updated last year
- Ludwig benchmark☆20Updated 3 years ago
- Repo to hold code and track issues for the collection of permissively licensed data☆23Updated this week
- ☆44Updated 4 months ago
- Code release for Dataless Knowledge Fusion by Merging Weights of Language Models (https://openreview.net/forum?id=FCnohuR6AnM)☆87Updated last year
- This repository contains code used for our Multi Sentence Inference NAACL'22 paper.☆12Updated 2 years ago
- Probabilistic LLM evaluations. [CogSci2023; ACL2023]☆73Updated 8 months ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆32Updated 10 months ago
- Efficiently computing & storing token n-grams from large corpora☆19Updated 5 months ago
- Official repo for EMNLP 2023 paper "Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations…☆28Updated last year