skalmadka / web-crawlerLinks
Distributed Web Crawler, Parser and Search Engine.
☆10Updated 9 years ago
Alternatives and similar repositories for web-crawler
Users that are interested in web-crawler are comparing it to the libraries listed below
Sorting:
- Code for KDD 2014 paper "Mining Topics in Documents: Standing on the Shoulders of Big Data"☆21Updated 10 years ago
- Code for the CIKM 2013 paper "Discovering Coherent Topics Using General Knowledge"☆11Updated 11 years ago
- Apache Nutch extensions☆34Updated 3 years ago
- A project to demonstrate maximum entropy models for extracting quotes from news articles in Python.☆26Updated 13 years ago
- NLP Utilities in Java☆43Updated 3 years ago
- Focused Crawler for VT's CTRNet☆10Updated 12 years ago
- A demo of how to use PageRank with Hadoop and SociaLite to identify anomalies in Healthcare Data☆47Updated 10 years ago
- Spark Tutorial at the University of Maryland☆38Updated 11 years ago
- Parses Solr's log file to get some basic query statistics☆20Updated 7 years ago
- Vowpal Wabbit Webservice. A web service that accepts VW formatted text and runs it through a VW daemon instance.☆40Updated 9 years ago
- Storm / Solr Integration☆19Updated last year
- ReactiveLDA is a fast, lightweight implementation of the Latent Dirichlet Allocation (LDA) algorithm, using a parallel vanilla Gibbs samp…☆61Updated 10 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆20Updated 9 years ago
- t test☆10Updated 11 years ago
- Script to perform dictionary based n-gram text tagging efficiently in apache spark☆11Updated 9 years ago
- This is my main Java library for all kinds of datastructures, algorithms and everything else that I need.☆73Updated 2 years ago
- Movielens collaborative filtering with Solr streaming expression☆11Updated 9 years ago
- An implementation of gibbs sampling for Latent Dirichlet Allocation☆30Updated 14 years ago
- A toolkit that wraps various natural language processing implementations behind a common interface.☆101Updated 8 years ago
- GPU Acceleration for Apache Spark☆34Updated 10 years ago
- Implementation of an algorithm computing the nearest "N" neighbours to a vector, using a collection of hyperplane hashers.☆30Updated 10 years ago
- Compute association strength over semantic networks in a dimensionality-reduced form.☆32Updated 10 years ago
- Tools for building a Lucene index for Semantic Vectors☆21Updated 10 years ago
- Social Media Data Mining and Analytics - HyperLogLog, BloomFilter and CountMinSketch with Scalding & Algebird☆27Updated 7 years ago
- A Stanford CoreNLP server, with example clients, using Apache Thrift.☆47Updated 7 years ago
- LASER-A Scalable Response Prediction Platform For Online Advertising☆48Updated 11 years ago
- ☆35Updated 12 years ago
- Stand-alone recommender system from Myrrix☆109Updated 2 years ago
- ☆15Updated 10 years ago