skalmadka / web-crawler
Distributed Web Crawler, Parser and Search Engine.
☆10Updated 8 years ago
Alternatives and similar repositories for web-crawler:
Users that are interested in web-crawler are comparing it to the libraries listed below
- Focused Crawler for VT's CTRNet☆10Updated 11 years ago
- Collects multimedia content shared through social networks.☆19Updated 9 years ago
- Code for the CIKM 2013 paper "Discovering Coherent Topics Using General Knowledge"☆11Updated 10 years ago
- Code for KDD 2014 paper "Mining Topics in Documents: Standing on the Shoulders of Big Data"☆21Updated 9 years ago
- word2vec-java☆7Updated 3 months ago
- Experiments with distributed matrix factorization. Presented at DataWorks Summit 2017, München.☆10Updated 6 years ago
- NLP Utilities in Java☆43Updated 2 years ago
- Implementation of the Chinese Whispers graph clustering algorithm☆8Updated 7 years ago
- Sparking Using Java8☆17Updated 9 years ago
- scalding powered machine learning☆109Updated 10 years ago
- Notes from Stanford NLP class☆24Updated 11 years ago
- Vizlinc☆14Updated 9 years ago
- Mirror of Apache Hadoop common☆15Updated 4 years ago
- Experiment code for AAAI paper: A Neural Probabilistic Model for Context Based Citation Recommendation☆9Updated 7 years ago
- Exploration Library in Java☆12Updated last year
- Named Entity Extraction on Twitter Stream using Apache Spark Streaming and Stanford CoreNLP☆15Updated 8 years ago
- A collection of efficient utilities for a data scientist.☆41Updated 9 years ago
- Nutch 2.3.1 plugin for whitelisting/blacklisting specific HTML elements☆13Updated 2 years ago
- Movielens collaborative filtering with Solr streaming expression☆11Updated 8 years ago
- Data Science in Scala - Conf. Talk Repo☆15Updated 8 years ago
- Algorithms that build k-nearest neighbors graph (k-nn graph): Brute-force, NN-Descent,...☆34Updated 5 years ago
- Programming assignments for Introduction to Recommendation Systems course on Coursera.org☆16Updated 3 years ago
- Graph algorithms implemented in GraphX and Spark styles☆15Updated 9 years ago
- iCQA - Intelligent Community Question Answering Framework☆32Updated 8 years ago
- ☆11Updated 10 years ago
- Llama - Low Latency Application MAster☆34Updated 2 years ago
- Storm / Solr Integration☆19Updated 11 months ago
- Tools to evaluate accuracies of various (research papers') metadata extraction libraries☆11Updated 9 years ago
- Python functions for popular relevance metrics (ndcg, err, etc)☆15Updated last year