com3dian / GrobidmonkeyLinks
The grobidmonkey package is an open-source package designed for postprocessing GROBID outputs.
☆12Updated last year
Alternatives and similar repositories for Grobidmonkey
Users that are interested in Grobidmonkey are comparing it to the libraries listed below
Sorting:
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆74Updated 9 months ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆70Updated 2 years ago
- Leveraging LLMs for Post-OCR Correction of Historical Newspapers☆15Updated last year
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆56Updated 3 months ago
- A python true casing utility that restores case information for texts☆88Updated 3 years ago
- A spaCy custom component that extracts and normalizes temporal expressions☆56Updated 2 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆34Updated 4 months ago
- ☆32Updated 2 years ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆64Updated last year
- A tiny BERT for low-resource monolingual models☆31Updated 2 weeks ago
- A accurate multilingual word aligner based on LaBSE☆24Updated 2 years ago
- Gamma Agreement in Python☆45Updated last year
- Source code for the Apple reproduction☆32Updated 4 years ago
- Neural Language Models for Historical Research☆29Updated last year
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆156Updated last year
- zero shot NER fine tuning☆13Updated 9 months ago
- Temporary remove unused tokens during training to save ram and speed.☆23Updated 6 months ago
- OpusFilter - Parallel corpus processing toolkit☆115Updated 3 weeks ago
- Build a dialog dataset from online books in many languages☆76Updated 3 years ago
- Personal information identification standard☆20Updated last year
- SegEval Segmentation Evaluation Package☆57Updated 2 years ago
- German GPT-2 model☆32Updated 4 years ago
- SeqScore: Scoring for named entity recognition and other sequence labeling tasks☆23Updated 3 weeks ago
- ☆50Updated last year
- A Super-Lightweight Annotation Tool for Experts: Label text in a terminal with just Python☆112Updated 3 weeks ago
- ☆105Updated 4 years ago
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.☆112Updated 7 months ago
- A python module for evaluating NERC and NEL system performances as defined in the HIPE shared tasks (formerly CLEF-HIPE-2020-scorer).☆15Updated last year
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Updated last year
- Repository for the paper "MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguatio…☆45Updated last year