com3dian / Grobidmonkey
The grobidmonkey package is an open-source package designed for postprocessing GROBID outputs.
☆11Updated last year
Alternatives and similar repositories for Grobidmonkey:
Users that are interested in Grobidmonkey are comparing it to the libraries listed below
- Finding mentions and citations to named and implicit research datasets from within the academic literature☆24Updated 6 months ago
- FrugalScore is an approach to learn a fixed, low cost version of any expensive NLG metric, while retaining most of its original performan…☆15Updated 2 years ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆58Updated 9 months ago
- A small python library to parse and write TSV files generated by the WebAnno software.☆12Updated 2 weeks ago
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆16Updated 10 months ago
- A python module for evaluating NERC and NEL system performances as defined in the HIPE shared tasks (formerly CLEF-HIPE-2020-scorer).☆14Updated 10 months ago
- Neural Language Models for Historical Research☆25Updated 6 months ago
- Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.☆52Updated last year
- Searching in-memory corpus with Corpus Query Language (CQL)☆19Updated 4 months ago
- A simple toolkit for conducting analyses using corpus methods☆25Updated 3 years ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated last year
- Tool to fix bitexts and tag near-duplicates for removal☆30Updated 2 months ago
- BERT and ELECTRA models trained on Europeana Newspapers☆38Updated 3 years ago
- coFR: COreference resolution tool for FRench (and singletons).☆24Updated 4 years ago
- NTREX -- News Test References for MT Evaluation☆83Updated 10 months ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆74Updated 3 weeks ago
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆17Updated 8 months ago
- German Alpaca Dataset (Cleaned + Translated)☆24Updated 2 years ago
- Automatically detect errors in annotated corpora.☆47Updated last year
- A Named-Entity Recogniser based on Grobid.☆52Updated 7 months ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆67Updated 2 years ago
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.☆31Updated last year
- List of corpora annotated for coreference for different languages☆17Updated 8 months ago
- Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissions☆19Updated 2 years ago
- ☆9Updated last year
- Data for the HIPE 2022 shared task.☆17Updated last year
- A accurate multilingual word aligner based on LaBSE☆21Updated last year
- GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings☆42Updated last year
- Repository for the paper "MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguatio…☆44Updated last year
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆12Updated last year