langtech-bsc / AnonymizationPipeline
Anonymization Pipeline for injesting data from outside of BSC that contains GDPR protected data.
☆13Updated last year
Alternatives and similar repositories for AnonymizationPipeline:
Users that are interested in AnonymizationPipeline are comparing it to the libraries listed below
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆57Updated 8 months ago
- The robust European language model benchmark.☆96Updated this week
- Generalist and Lightweight Model for Text Classification☆113Updated 2 weeks ago
- ☆23Updated 2 years ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆108Updated 10 months ago
- ☆47Updated last year
- Framework for working with brat-annotated .ann files☆10Updated 10 months ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated 11 months ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆67Updated 2 years ago
- A High-level Library for Named Entity Recognition in Python.☆23Updated last year
- A spaCy custom component that extracts and normalizes temporal expressions☆54Updated 2 years ago
- ☆43Updated last year
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆12Updated last year
- ☆85Updated last week
- A list of awesome open source projects in the machine learning field, who's developers are mainly based in Germany☆43Updated 7 months ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆79Updated last year
- Building NER and RE components using HuggingFace Transformers☆50Updated 2 years ago
- Pre-production releases for Spacy in Catalan☆14Updated 3 years ago
- Notebooks for training universal 0-shot classifiers on many different tasks☆122Updated 3 months ago
- Course for Interpreting ML Models☆52Updated 2 years ago
- This repository contains the complete source code of the MedTAG annotation tool. MedTAG is a biomedical annotation tool for tagging biome…☆12Updated 2 years ago
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training☆63Updated 2 months ago
- Vespa application making an index of the CORD-19 dataset.☆39Updated 2 months ago
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆30Updated 2 years ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated 11 months ago
- A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.☆106Updated 11 months ago
- Fact checking baseline combining dense retrieval and textual entailment☆28Updated 2 months ago
- REMERGE - Multi-Word Expression discovery algorithm☆14Updated 2 years ago
- NLP @ TU Wien☆17Updated 4 months ago