NorskRegnesentral / NeuralTextSanitizer
Neural models for detecting and masking personal information from texts
☆14Updated last year
Related projects ⓘ
Alternatives and complementary repositories for NeuralTextSanitizer
- ParaNames: A multilingual resource for parallel names☆30Updated 5 months ago
- GC4LM: A Colossal (Biased) language model for German☆13Updated 3 years ago
- A survey of corpora for Germanic low-resource languages and dialects☆24Updated 3 months ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆11Updated 11 months ago
- ☆24Updated 4 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆22Updated last month
- SeqScore: Scoring for named entity recognition and other sequence labeling tasks☆20Updated 3 weeks ago
- ☆12Updated 2 years ago
- UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them☆37Updated 2 years ago
- A python module for evaluating NERC and NEL system performances as defined in the HIPE shared tasks (formerly CLEF-HIPE-2020-scorer).☆13Updated 5 months ago
- Emory Language and Information Toolkit☆37Updated last year
- ☆15Updated last year
- Multilingual Open Text☆25Updated 2 weeks ago
- ☆27Updated 2 months ago
- Ranking of fine-tuned HF models as base models.☆35Updated last year
- Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/☆81Updated last month
- CrossRE: A Cross-Domain Dataset for Relation Extraction (Findings of EMNLP 2022)☆47Updated 2 months ago
- ☆13Updated 3 years ago
- Evaluation code and data for "Automatic Correction of Human Translations" [NAACL 2022].☆19Updated last year
- ☆73Updated 3 years ago
- Code for pre-training CharacterBERT models (as well as BERT models).☆34Updated 3 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆31Updated 2 years ago
- Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection☆15Updated 3 years ago
- The CleanCoNLL dataset from our EMNLP 2023 paper where we corrected annotation errors and inconsistencies in CoNLL-03.☆19Updated 4 months ago
- Fine-grained sentiment annotations of NoReC☆20Updated 2 years ago
- The dataset and code for ACL 2022 paper "SciNLI: A Corpus for Natural Language Inference on Scientific Text" are released here.☆25Updated last year
- Data programming by demonstration for information extraction and span annotation☆35Updated 3 years ago
- A Word Sense Disambiguation system integrating implicit and explicit external knowledge.☆66Updated 3 years ago
- Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.☆50Updated last year
- Data for the HIPE 2022 shared task.☆15Updated 11 months ago