josephdviviano / whatsinthebox
analysis of public NLP corpora
☆12Updated last year
Related projects: ⓘ
- A weak supervision framework for (partial) labeling functions☆14Updated 2 months ago
- Embedding Recycling for Language models☆38Updated last year
- ☆27Updated last year
- Ranking of fine-tuned HF models as base models.☆35Updated last year
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆30Updated 3 months ago
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"☆28Updated 2 years ago
- Few-shot Learning with Auxiliary Data☆26Updated 9 months ago
- Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling☆30Updated 3 years ago
- Code for the paper SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts (AKBC 2021). https://openreview.net/forum?id=OF…☆25Updated 2 years ago
- Code for "Rissanen Data Analysis: Examining Dataset Characteristics via Description Length" by Ethan Perez, Douwe Kiela, and Kyungyhun Ch…☆35Updated 3 years ago
- diagNNose is a Python library that facilitates a broad set of tools for analysing hidden activations of neural models.☆81Updated 10 months ago
- Code for NAACL 2022 paper "Reframing Human-AI Collaboration for Generating Free-Text Explanations"☆31Updated last year
- MultiCite code and data. Models are available on Huggingface.☆28Updated 2 years ago
- ☆13Updated 10 months ago
- ☆35Updated last year
- ☆19Updated 2 years ago
- ☆16Updated last year
- ☆30Updated 4 years ago
- Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.☆50Updated last year
- ☆20Updated 3 years ago
- Adding new tasks to T0 without catastrophic forgetting☆30Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆42Updated 10 months ago
- ☆23Updated 2 weeks ago
- ☆66Updated last month
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20Updated 2 years ago
- Documentation effort for the BookCorpus dataset☆30Updated 3 years ago
- ☆37Updated 3 years ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆11Updated 9 months ago
- ☆33Updated 2 years ago
- Expertise modeling for the OpenReview matching system☆31Updated this week