skalmadka / web-crawlerLinks
Distributed Web Crawler, Parser and Search Engine.
☆10Updated 9 years ago
Alternatives and similar repositories for web-crawler
Users that are interested in web-crawler are comparing it to the libraries listed below
Sorting:
- Code for KDD 2014 paper "Mining Topics in Documents: Standing on the Shoulders of Big Data"☆21Updated 9 years ago
- Code for the CIKM 2013 paper "Discovering Coherent Topics Using General Knowledge"☆11Updated 11 years ago
- Compute association strength over semantic networks in a dimensionality-reduced form.☆32Updated 10 years ago
- NLP Utilities in Java☆43Updated 2 years ago
- Parses Solr's log file to get some basic query statistics☆20Updated 6 years ago
- A project to demonstrate maximum entropy models for extracting quotes from news articles in Python.☆25Updated 13 years ago
- Tools for building a Lucene index for Semantic Vectors☆21Updated 10 years ago
- Named Entity Extraction on Twitter Stream using Apache Spark Streaming and Stanford CoreNLP☆15Updated 8 years ago
- A toolkit that wraps various natural language processing implementations behind a common interface.☆101Updated 7 years ago
- Storm / Solr Integration☆19Updated last year
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆35Updated 8 years ago
- scalding powered machine learning☆109Updated 10 years ago
- Distributed implementation of Robust PLSA using Spark☆12Updated 4 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- Reference implementations of data-intensive algorithms in MapReduce and Spark☆82Updated 7 years ago
- t test☆10Updated 11 years ago
- A set of methods that predict the future values of popularity indices for news posts using a variety of features.☆33Updated 7 years ago
- An implementation of gibbs sampling for Latent Dirichlet Allocation☆30Updated 14 years ago
- Using latent Dirichlet allocation (LDA) in Apache Lucene☆58Updated 12 years ago
- Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets☆93Updated 9 years ago
- A demo of how to use PageRank with Hadoop and SociaLite to identify anomalies in Healthcare Data☆47Updated 9 years ago
- python library for interacting with SolrCloud☆36Updated 4 years ago
- ☆20Updated 8 years ago
- Parallelized Online Matrix Factorization for Collaborative Filtering using Stochastic Gradient Descent☆43Updated 9 years ago
- Stand-alone recommender system from Myrrix☆109Updated last year
- Elasticsearch Latent Semantic Indexing experimentation☆33Updated 5 years ago
- System for mining Wikipedia Usage data to read our collective mind☆21Updated 10 years ago
- Implementation of an algorithm computing the nearest "N" neighbours to a vector, using a collection of hyperplane hashers.☆30Updated 10 years ago
- Movielens collaborative filtering with Solr streaming expression☆11Updated 8 years ago
- Easily identify and label sentence intervals using various taggers.☆16Updated 8 years ago