ContinuumIO / nutchpyLinks
For interacting with nutch via Python
☆29Updated last week
Alternatives and similar repositories for nutchpy
Users that are interested in nutchpy are comparing it to the libraries listed below
Sorting:
- Nutch-Python is a Python binding to the Apache Nutch™ REST services allowing Nutch to be called natively in the Python community. — Edit☆39Updated 9 years ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆38Updated last year
- Topic modeling web application☆40Updated 10 years ago
- Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets☆92Updated 10 years ago
- Mirror of Apache Stanbol (incubating)☆116Updated last year
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 6 years ago
- A Python library for learning from dimensionality reduction, supporting sparse and dense matrices.☆78Updated 8 years ago
- ☆14Updated 4 years ago
- Low-level primitives for collapsed Gibbs sampling in python and C++☆33Updated last year
- Pattern-of-Behavior Search Tool☆11Updated 3 years ago
- ☆21Updated 10 years ago
- Unified interface for local and distributed ndarrays☆157Updated 7 years ago
- A system for connecting language to space and time.☆64Updated 5 years ago
- Json Wikipedia, contains code to convert the Wikipedia xml dump into a json dump. Questions? https://gitter.im/idio-opensource/Lobby☆17Updated 3 years ago
- A Topic Modeling toolbox☆92Updated 9 years ago
- An example project for doing grid search in MLlib☆13Updated 11 years ago
- NLP toolkit (tokenizer, POS-tagger, parser, etc.)☆43Updated 8 years ago
- GPU Acceleration for Apache Spark☆34Updated 10 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 4 years ago
- Vector Space Model Framework developed for InPhO☆39Updated 9 months ago
- A repo that contains outgoing links from DBpedia☆49Updated 5 years ago
- Supporting infrastructure to run scientific experiments without a scientific workflow management system.☆123Updated last month
- A repository for the "Combining DBpedia and Topic Modeling" GSoC 2016 idea☆13Updated 9 years ago
- Distributed Numpy☆148Updated 8 years ago
- General Architecture for Text Engineering☆49Updated 9 years ago
- Scientific Spark - a NASA AIST14 project☆86Updated 7 years ago
- Latent Dirichlet Allocation for topic modeling of streamed data sources☆101Updated 10 years ago
- Raw Wikipedia counts for entity linking☆19Updated 8 years ago
- Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead☆53Updated 7 years ago
- Seed acquisition tool to bootstrap focused crawlers☆23Updated 8 years ago