josephdviviano / whatsintheboxLinks

analysis of public NLP corpora

☆11

Alternatives and similar repositories for whatsinthebox

Users that are interested in whatsinthebox are comparing it to the libraries listed below

Sorting:

IBM / model-recycling
Ranking of fine-tuned HF models as base models.
☆35Updated last month
allenai / EmbeddingRecycling
Embedding Recycling for Language models
☆38Updated last year
martiansideofthemoon / relic-retrieval
Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).
☆20Updated 3 years ago
peterbhase / SLAG-Belief-Updating
Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"
☆28Updated 3 years ago
i-machine-think / diagNNose
diagNNose is a Python library that facilitates a broad set of tools for analysing hidden activations of neural models.
☆82Updated last year
JoaoLages / RATransformers
RATransformers 🐭- Make your transformer (like BERT, RoBERTa, GPT-2 and T5) Relation Aware!
☆41Updated 2 years ago
g8a9 / ear
Code associated with the paper "Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists"
☆49Updated 3 years ago
allenai / multicite
MultiCite code and data. Models are available on Huggingface.
☆32Updated 3 years ago
thakur-nandan / sprint
SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.
☆45Updated last year
BatsResearch / nplm
A weak supervision framework for (partial) labeling functions
☆16Updated 11 months ago
castorini / hf-spacerini
Plug-and-play Search Interfaces with Pyserini and Hugging Face
☆32Updated last year
allenai / tailor
☆31Updated last year
allenai / aspire
Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.
☆53Updated last year
UniversalNER / UniversalNER
☆27Updated 4 months ago
jjzha / cartography-al
Code base for the EMNLP 2021 Findings paper: Cartography Active Learning
☆14Updated 3 weeks ago
MeLeLBGU / SaGe
Code for SaGe subword tokenizer (EACL 2023)
☆25Updated 6 months ago
jwallat / knowledge-probing
Code for our BlackboxNLP'20 paper "BERTnesia: Investigating the capture and forgetting of knowledge in BERT"
☆9Updated 3 years ago
shauli-ravfogel / rlace-icml
☆36Updated 2 years ago
MichiganNLP / visual_diversity_budget
Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
☆8Updated last year
terrierteam / pyterrier_doc2query
☆37Updated 6 months ago
EleutherAI / semantic-memorization
☆44Updated 7 months ago
sophiaalthammer / parm
This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…
☆40Updated 3 years ago
petezh / OpenD5
Tasks for describing differences between text distributions.
☆16Updated 10 months ago
jpwahle / lrec22-d3-dataset
The official repository for the LREC 2022 paper "D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science …
☆27Updated 2 years ago
inspired-cognition / critique-apps
Apps built using Inspired Cognition's Critique.
☆58Updated 2 years ago
csinva / iprompt
Finding semantically meaningful and accurate prompts.
☆47Updated last year
UKPLab / eacl2024-lagonn
Source code and data for Like a Good Nearest Neighbor
☆29Updated 5 months ago
MilaNLProc / language-invariant-properties
☆21Updated 3 years ago
huggingface / olm-training
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆93Updated 2 years ago
allenai / smashed
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…
☆33Updated last year