proycon / analiticcl
an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction
☆32Updated 3 months ago
Alternatives and similar repositories for analiticcl:
Users that are interested in analiticcl are comparing it to the libraries listed below
- Pure Rust port of CRFsuite: a fast implementation of Conditional Random Fields (CRFs)☆29Updated 2 months ago
- Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot☆13Updated 4 years ago
- Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.☆72Updated last year
- 📑 Python Package to reconstruct the original continuous text from PDFs with language models☆32Updated last year
- Rust binding to crfsuite☆25Updated 2 years ago
- This is a new backend implementation of the ANNIS linguistic search and visualization system.☆17Updated this week
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆66Updated 4 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆60Updated 8 months ago
- Use spaCy for NLP and output to the FoLiA XML format.☆12Updated 10 months ago
- An efficient data structure for fast string similarity searches☆22Updated 3 years ago
- OCRopus model for Gothic print (Fraktur)☆18Updated 4 years ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆38Updated 2 years ago
- Faster, modernized fork of the language identification tool langid.py☆50Updated last month
- Generic Environment for Context-Aware Correction of Orthography☆22Updated 2 years ago
- A powerful, tagset-independent and theory-neutral meta model and API for storing, manipulating, and representing nearly all types of ling…☆15Updated last year
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated last year
- Process, enhance and evaluate multiple OCR output.☆22Updated 2 months ago
- Discourse Analysis Tool Suite☆18Updated this week
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆12Updated 5 months ago
- An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification☆83Updated 10 months ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆111Updated this week
- Parser for KAF NAF files written in Python☆16Updated 3 years ago
- Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissions☆19Updated last year
- Open Access PDF harvester☆35Updated 8 months ago
- Modular Rust transformer/LLM library using Candle☆36Updated 8 months ago
- Keeping It Simple is Hard☆10Updated 11 months ago
- An OCR evaluation tool☆64Updated last month
- Make MP3 albums out of Academic PDFs. Works by gluing together Grobid and TTS offerings.☆12Updated last year
- A suite of batches and tools for OCR tasks.☆71Updated last year
- A software to detect text reuse with BLAST.☆14Updated 5 years ago