mgedmin / pdf2html
Wrapper for pdftohtml that tries to extract paragraph structure
☆49Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for pdf2html
- Editor of training sets for page segmentation and zone classification of scholarly PDFs☆11Updated 8 years ago
- An efficient data structure for fast string similarity searches☆23Updated 3 years ago
- clone of https://code.google.com/p/splitta/ so it can be a git submodule☆34Updated 11 years ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆110Updated 4 months ago
- ☆40Updated 6 years ago
- NLP pipeline software using common workflow language☆34Updated 5 years ago
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆64Updated 4 years ago
- A workflow system for Natural Language Processing.☆21Updated 5 years ago
- A tool for visualizing trees, tailored specifically to the analysis of parse trees.☆81Updated 4 years ago
- LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance …☆82Updated 6 years ago
- Annodoc annotation documentation support system☆35Updated 4 years ago
- Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3☆50Updated 9 years ago
- Python search module for fast approximate string matching☆53Updated last year
- An open-source CRF Reference String Parsing Package☆155Updated 4 years ago
- High-level build project for all LAPDF-Text submodules☆103Updated 9 years ago
- Termonology Extraction Program (English Version)☆41Updated 3 months ago
- PDF parser and converter to HTML☆83Updated last month
- CoNLL-U format library for JavaScript☆72Updated 7 years ago
- Cython wrapper on Hunspell Dictionary☆23Updated 10 months ago
- Pipeline for distributed Natural Language Processing, made in Python☆65Updated 7 years ago
- PDF to XML ALTO file converter☆215Updated last month
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆50Updated 4 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆60Updated 5 months ago
- python interface for mate tools☆16Updated 6 years ago
- Python binding to libpoppler with focus on text extraction☆98Updated 2 years ago
- A machine learning software for extracting information from scholarly documents☆23Updated 3 years ago
- Annotation Management for Prodigy, that support multiple users working in many projects☆15Updated 5 years ago
- Parser for KAF NAF files written in Python☆16Updated 3 years ago