bamman-group / gpt4-books
Code and data to support "Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4"
β68Updated last year
Alternatives and similar repositories for gpt4-books:
Users that are interested in gpt4-books are comparing it to the libraries listed below
- π€ Disaggregators: Curated data labelers for in-depth analysis.β65Updated 2 years ago
- β21Updated 3 weeks ago
- Learning to route instances for Human vs AI Feedbackβ19Updated last week
- A BERT-based application for reusable text classification at scaleβ37Updated last year
- Ranking of fine-tuned HF models as base models.β35Updated last year
- Finding semantically meaningful and accurate prompts.β46Updated last year
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.β31Updated last year
- Plug-and-play Search Interfaces with Pyserini and Hugging Faceβ32Updated last year
- Documentation effort for the BookCorpus datasetβ33Updated 3 years ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engineβ31Updated 3 years ago
- An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.β32Updated last year
- β67Updated 11 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β46Updated last year
- MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistencyβ29Updated last year
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"β28Updated 2 years ago
- β35Updated 2 years ago
- Source code and data for Like a Good Nearest Neighborβ28Updated last month
- Tools for managing datasets for governance and training.β82Updated 2 weeks ago
- Preprocessing and analysis for training SNOMED-CT concept embeddings from CORD-19 corpusβ14Updated last year
- BPE modification that implements removing of the intermediate tokens during tokenizer training.β25Updated 2 months ago
- β31Updated last year
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptionsβ69Updated last year
- β90Updated 8 months ago
- Code for SaGe subword tokenizer (EACL 2023)β22Updated 2 months ago
- https://footprints.baulab.infoβ16Updated 4 months ago
- MoodCatπΌ classifies the mood of English sentences.β14Updated 2 years ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.β12Updated last year
- β31Updated last year
- Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)β41Updated 3 years ago
- Code release for Dataless Knowledge Fusion by Merging Weights of Language Models (https://openreview.net/forum?id=FCnohuR6AnM)β86Updated last year