josephdviviano / whatsinthebox
analysis of public NLP corpora
☆11Updated 2 years ago
Alternatives and similar repositories for whatsinthebox:
Users that are interested in whatsinthebox are comparing it to the libraries listed below
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"☆28Updated 2 years ago
- MultiCite code and data. Models are available on Huggingface.☆31Updated 2 years ago
- Embedding Recycling for Language models☆38Updated last year
- Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost☆8Updated 11 months ago
- RATransformers 🐭- Make your transformer (like BERT, RoBERTa, GPT-2 and T5) Relation Aware!☆41Updated 2 years ago
- Few-shot Learning with Auxiliary Data☆27Updated last year
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆33Updated 11 months ago
- A weak supervision framework for (partial) labeling functions☆16Updated 9 months ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆65Updated 2 years ago
- Code for "Rissanen Data Analysis: Examining Dataset Characteristics via Description Length" by Ethan Perez, Douwe Kiela, and Kyungyhun Ch…☆35Updated 3 years ago
- ☆44Updated 5 months ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆31Updated last year
- Source code and data for Like a Good Nearest Neighbor☆28Updated 3 months ago
- Are foundation LMs multilingual knowledge bases? (EMNLP 2023)☆19Updated last year
- This is the official PyTorch repo for "UNIREX: A Unified Learning Framework for Language Model Rationale Extraction" (ICML 2022).☆24Updated 2 years ago
- ☆19Updated 2 years ago
- The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.☆11Updated 3 years ago
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Updated last year
- Expertise modeling for the OpenReview matching system☆35Updated 2 weeks ago
- Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.☆42Updated last month
- Code for SaGe subword tokenizer (EACL 2023)☆24Updated 4 months ago
- ☆31Updated last year
- [ACL 2023]: Training Trajectories of Language Models Across Scales https://arxiv.org/pdf/2212.09803.pdf☆23Updated last year
- Closed-form polynomial approximations to neural networks☆12Updated 2 months ago
- The Implementation for the Paper "Time-Stamped Language Model: Teaching Language Models toUnderstand The Flow of Events"☆11Updated 3 years ago
- Documentation effort for the BookCorpus dataset☆34Updated 3 years ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆25Updated 5 months ago
- Semantically Structured Sentence Embeddings☆65Updated 6 months ago
- Tasks for describing differences between text distributions.☆16Updated 8 months ago
- ☆34Updated last year