fedelemantuano / tika-app-pythonLinks
Python bindings for Apache Tika
☆23Updated 5 years ago
Alternatives and similar repositories for tika-app-python
Users that are interested in tika-app-python are comparing it to the libraries listed below
Sorting:
- Nutch-Python is a Python binding to the Apache Nutch™ REST services allowing Nutch to be called natively in the Python community. — Edit☆39Updated 9 years ago
- Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.☆108Updated 5 months ago
- Record Linkage ToolKit (Find and link entities)☆110Updated 2 years ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆38Updated last year
- Trying to generate name synonyms from wikidata☆34Updated 5 years ago
- A toolkit for clustering web pages based on various similarity measures.☆34Updated 3 years ago
- Extraction Toolkit☆83Updated 3 years ago
- Babel Street Analytics Client Library for Python☆38Updated 6 months ago
- Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.☆45Updated last month
- The GATE Embedded core API and GATE Developer application☆88Updated 10 months ago
- Python library for information extraction of quantities from unstructured text☆119Updated 2 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- For extracting measurements and related entities from text☆58Updated 5 years ago
- An index data structure for approximate string search.☆23Updated 6 years ago
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 5 years ago
- General Architecture for Text Engineering☆49Updated 9 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Hadoop integration code for working with with Apache cTAKES☆10Updated 11 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆272Updated 2 years ago
- GROBID extension for identifying and normalizing physical quantities.☆82Updated 3 months ago
- Wikipedia API wrapper for humans and elk. (en.wikipedia.org/w/api.php, get it?)☆36Updated 11 years ago
- Entity Extraction Text Processor☆148Updated last year
- stav text annotation visualiser☆34Updated 13 years ago
- Evolutionary Graph Pattern Learner that learns SPARQL queries for a given set of source-target-pairs from an endpoint.☆91Updated 2 years ago
- Apache UIMA uimaFIT☆32Updated 9 months ago
- Scrapes the web. Gets the news.☆13Updated 9 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆65Updated last year
- Analytic UIMA pipelines using Spark☆23Updated 9 years ago
- importing Thomson Reuters' permID dataset into Neo4j☆19Updated 7 years ago
- 🚀GUI for training spaCy models☆55Updated 4 years ago