skalmadka / web-crawlerLinks
Distributed Web Crawler, Parser and Search Engine.
☆10Updated 9 years ago
Alternatives and similar repositories for web-crawler
Users that are interested in web-crawler are comparing it to the libraries listed below
Sorting:
- Code for the CIKM 2013 paper "Discovering Coherent Topics Using General Knowledge"☆11Updated 11 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- Includes Code for Inference and Evaluation of Topic Models for Selectional Preferences☆16Updated 2 years ago
- Code for KDD 2014 paper "Mining Topics in Documents: Standing on the Shoulders of Big Data"☆21Updated 9 years ago
- word2vec-java☆7Updated 9 months ago
- Focused Crawler for VT's CTRNet☆10Updated 12 years ago
- Implicit relation extractor using a natural language model.☆24Updated 7 years ago
- Nutch 2.3.1 plugin for whitelisting/blacklisting specific HTML elements☆14Updated 3 years ago
- Algorithms that build k-nearest neighbors graph (k-nn graph): Brute-force, NN-Descent,...☆34Updated 6 years ago
- Uncharted Ensemble Clustering is a flexible multi-threaded clustering library for rapidly constructing tailored clustering solutions that…☆32Updated 10 years ago
- Named Entity Extraction on Twitter Stream using Apache Spark Streaming and Stanford CoreNLP☆15Updated 8 years ago
- stav text annotation visualiser☆34Updated 13 years ago
- iCQA - Intelligent Community Question Answering Framework☆31Updated 8 years ago
- VoltDB Click Stream Processing Example.☆16Updated 7 years ago
- A demo of how to use PageRank with Hadoop and SociaLite to identify anomalies in Healthcare Data☆47Updated 9 years ago
- ☆20Updated 8 years ago
- fuzzydb is a fuzzy matching database engine capable of providing human-like search results that make life much easier for users of websit…☆20Updated 2 years ago
- scalding powered machine learning☆109Updated 10 years ago
- Contains the implementation of algorithms that estimate the geographic location of media content based on their content and metadata. It …☆15Updated 8 years ago
- ReactiveLDA is a fast, lightweight implementation of the Latent Dirichlet Allocation (LDA) algorithm, using a parallel vanilla Gibbs samp…☆61Updated 10 years ago
- DEPRECATED! Use https://github.com/h2oai/sparkling-water repository! H2O and Spark interoperability based on Tachyon.☆44Updated 10 years ago
- A chef cookbook for deploying spark☆30Updated 12 years ago
- Sparse feature extraction with Spark☆30Updated 6 years ago
- A collection of efficient utilities for a data scientist.☆41Updated 10 years ago
- NLP Utilities in Java☆43Updated 2 years ago
- A pyLucene-based search module for searching books from goodreads.com☆26Updated 7 years ago
- ☆12Updated 9 years ago
- ☆21Updated 10 years ago
- ☆24Updated 10 years ago
- Implementation of the Chinese Whispers graph clustering algorithm☆8Updated 7 years ago