b-cube / nutch-crawlerLinks
Apache Nutch fork tunned for web services and data discovery.
☆10Updated 10 years ago
Alternatives and similar repositories for nutch-crawler
Users that are interested in nutch-crawler are comparing it to the libraries listed below
Sorting:
- Analyze the structure and dynamics of an open source project's developer community, using graph algorithms, etc.☆58Updated 4 years ago
- ☆42Updated 9 years ago
- Topic modeling web application☆40Updated 9 years ago
- ☆25Updated 10 years ago
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Updated 8 years ago
- ☆92Updated 9 years ago
- ElasticSearch Prediction Generator and Plugin☆22Updated 9 years ago
- A Topic Modeling toolbox☆92Updated 9 years ago
- RDF-Centric Map/Reduce Framework and Freebase data conversion tool☆149Updated 3 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 6 years ago
- Movie recommendations and more in MapReduce and Scalding☆117Updated 12 years ago
- Elasticsearch Latent Semantic Indexing experimentation☆33Updated 5 years ago
- See https://github.com/tworavens/tworavens for current repository for this project and http://2ra.vn for project pages.☆30Updated 6 years ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆37Updated last year
- Machine Learning solution for Kaggle.com's "Partly Sunny with a Chance of Hashtags"☆27Updated 11 years ago
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 9 years ago
- Python binding for gumbo-parser using Cython☆14Updated 8 years ago
- a set of services that provide NLP facilities☆25Updated 4 years ago
- Task Orchestration Tool Based on SWF and boto3☆38Updated 6 years ago
- distributed latent dirichlet allocation☆30Updated 13 years ago
- Seldon Spark Jobs☆26Updated 10 years ago
- Docker images for data science from Wise.io☆50Updated 9 years ago
- Human-Powered Data Analysis with Mechanical Turk☆300Updated 12 years ago
- mltk - Moz Language Tool Kit☆12Updated 10 years ago
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 5 years ago
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆51Updated 7 years ago
- Topic Modeling the Sarah Palin emails.☆34Updated 13 years ago
- Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.☆158Updated 2 years ago
- Library for Geo-Inferencing in Twitter Data☆28Updated 8 years ago
- Nutch with Cassandra and Elasticsearch on Docker☆17Updated 3 years ago