USCDataScience / tika-dockersLinks
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
☆21Updated last year
Alternatives and similar repositories for tika-dockers
Users that are interested in tika-dockers are comparing it to the libraries listed below
Sorting:
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-test…☆73Updated 3 weeks ago
- Efficient indexing and retrieval of OCR bounding boxes in Solr☆22Updated 6 years ago
- Highly performant, lightweight framework for linked data processing. Supports RDFa, JSON-LD, RDF/XML and plain text formats, runs on Andr…☆50Updated 3 years ago
- Advanced desktop search/corpus exploration prototype☆21Updated 4 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆277Updated 3 years ago
- Advanced graph rewriting and LLOD publication for CoNLL and other TSV formats☆25Updated last month
- Highlighting various OCR formats directly in Solr☆87Updated 2 weeks ago
- Solr client and user interface for search☆22Updated last year
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆32Updated last year
- Java Wiktionary Library☆59Updated 3 years ago
- This is the facade for installation and access to the individual components☆15Updated 2 weeks ago
- An RDF plugin for Solr☆114Updated last year
- SOLR bulk indexing utility for the command line.☆45Updated 2 months ago
- Express.js middleware to support an LDP server built on MongoDB☆14Updated 4 years ago
- Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.☆250Updated this week
- SKOS Support for Apache Lucene and Solr☆56Updated 4 years ago
- Image recognition on Spark cluster powered by Deeplearning4j and Apache Tika☆14Updated 8 years ago
- a pure javascript frontend for ElasticSearch search indices.☆80Updated 7 years ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆49Updated 3 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34Updated 2 years ago
- 🚀GUI for training spaCy models☆55Updated 4 years ago
- NLP framework: sentence detector, tokeniser, pos-tagger and dependency parser☆51Updated 8 months ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆38Updated last year
- Generate high-quality DOCX files using a simplified XML format (simple word processing XML).☆44Updated last month
- Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in t…☆132Updated 2 months ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
- Solrstrap is a Query-Result interface for Solr written in JavaScript, HTML and CSS☆87Updated 8 years ago
- Federated Knowledge Extraction Framework☆193Updated 2 years ago
- Image comparison QA tool for digital preservation workflows.☆14Updated 11 years ago