ContinuumIO / nutchpyLinks
For interacting with nutch via Python
☆29Updated 3 weeks ago
Alternatives and similar repositories for nutchpy
Users that are interested in nutchpy are comparing it to the libraries listed below
Sorting:
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆38Updated last year
- Nutch-Python is a Python binding to the Apache Nutch™ REST services allowing Nutch to be called natively in the Python community. — Edit☆39Updated 9 years ago
- Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets☆92Updated 9 years ago
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 5 years ago
- Mirror of Apache Stanbol (incubating)☆115Updated last year
- ☆21Updated 9 years ago
- Topic modeling web application☆40Updated 10 years ago
- ☆92Updated 10 years ago
- Looking at big data? Add a little salt.☆59Updated 2 years ago
- PyRDM is a Python-based library for research data management (RDM). It facilitates the automated publication of scientific software and a…☆32Updated 4 years ago
- Ranking Entity Types using the Web of Data☆30Updated 9 years ago
- This is the ETL lib package. It provides an API to munge and prepare JSON, TSV and other data using Apache Tika and JSON parsing/loading …☆18Updated last year
- Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead☆53Updated 7 years ago
- Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.☆108Updated 9 months ago
- Pattern-of-Behavior Search Tool☆11Updated 3 years ago
- stav text annotation visualiser☆34Updated 14 years ago
- A Utility Library for Wikipedia dumps☆33Updated 8 years ago
- Supporting infrastructure to run scientific experiments without a scientific workflow management system.☆122Updated 3 weeks ago
- Unified interface for local and distributed ndarrays☆157Updated 7 years ago
- Scientific Spark - a NASA AIST14 project☆86Updated 7 years ago
- A repository for the "Combining DBpedia and Topic Modeling" GSoC 2016 idea☆13Updated 9 years ago
- A repo that contains outgoing links from DBpedia☆49Updated 5 years ago
- An example project for doing grid search in MLlib☆13Updated 11 years ago
- ☆14Updated 4 years ago
- Seed acquisition tool to bootstrap focused crawlers☆23Updated 8 years ago
- VisTrails is an open-source data analysis and visualization tool. It provides a comprehensive provenance infrastructure that maintains de…☆104Updated 8 years ago
- HIPI: Hadoop Image Processing Interface☆132Updated 8 years ago
- A Topic Modeling toolbox☆92Updated 9 years ago
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆52Updated 8 years ago
- A toolkit for clustering web pages based on various similarity measures.☆34Updated 4 years ago