rsling / texrex
texrex web page cleaning & ClaraX random walk crawler
☆11Updated 3 years ago
Alternatives and similar repositories for texrex:
Users that are interested in texrex are comparing it to the libraries listed below
- A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.☆23Updated last year
- Tutorial on NE processing for Digital Humanities - DH Utrech 2019☆25Updated 5 years ago
- ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with…☆74Updated last week
- Named entity annotation tool☆27Updated last year
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- Core libraries by the PRImA Research Lab☆16Updated 6 months ago
- Humanities Entity Recognition: robust, practical, efficient Named Entity Recognition for today's digital humanist☆37Updated 5 years ago
- A neural network that jointly part-of-speech tags and lemmatizes sentences, boosting accuracy for morphologically-rich languages (Czech, …☆34Updated 5 years ago
- A Corpus Data Retrieval Index using Lucene for Look-Ups☆17Updated this week
- A tool for automatic spelling normalization☆20Updated 4 years ago
- Linguistic and stylistic complexity measures for (literary) texts☆79Updated last year
- Editor for aligned parallel texts (personal desktop application).☆19Updated 4 years ago
- A Pythonic API and some command line tools to access the Transkribus server via its REST API☆27Updated 2 years ago
- Named Entity Recognition☆17Updated 3 months ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- CERberus -- guardian against character errors☆27Updated last year
- Official releases of the PROIEL treebank of ancient Indo-European languages☆37Updated last year
- A simple configurable tool for manipulating dependency trees.☆13Updated last month
- ☆12Updated 2 years ago
- Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at…☆22Updated 6 months ago
- Java based viewer for PAGE XML files (layout + text content). Also supports ALTO XML, FineReader XML, and HOCR.☆35Updated last year
- OCRopus model for Gothic print (Fraktur)☆18Updated 4 years ago
- You Actually Look Twice At it☆30Updated 3 weeks ago
- English web corpus with 4M tokens and several annotation types☆26Updated last year
- Latin texts annotated for named entities and NER tagger used for the Herodotos Project (Ohio State University / Ghent University)☆10Updated 2 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆37Updated 3 years ago
- Public repository for Coptic SCRIPTORIUM Corpora Releases☆33Updated last month
- Repository for "Towards Robust Named Entity Recognition for Historic German"☆18Updated 4 years ago
- Command line tool to convert page layout files to the latest PAGE XML format. It supports all previous versions of the PAGE format as wel…☆23Updated 4 years ago
- Multi Tier Annotation Search☆26Updated 3 years ago