skalmadka / web-crawler
Distributed Web Crawler, Parser and Search Engine.
☆10Updated 8 years ago
Related projects: ⓘ
- Collects multimedia content shared through social networks.☆19Updated 9 years ago
- Tweet Analysis with Spark☆15Updated 7 years ago
- Experiments with distributed matrix factorization. Presented at DataWorks Summit 2017, München.☆10Updated 6 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆21Updated 8 years ago
- Spark In MapReduce (SIMR) - launching Spark applications on existing Hadoop MapReduce infrastructure☆45Updated 2 years ago
- Examples for Fast Data Processing with Spark☆59Updated 11 years ago
- word2vec-java☆7Updated 3 years ago
- Focused Crawler for VT's CTRNet☆10Updated 11 years ago
- Named Entity Extraction on Twitter Stream using Apache Spark Streaming and Stanford CoreNLP☆15Updated 7 years ago
- Sample custom Nifi processor to process tcpdump☆18Updated 8 years ago
- Experimental logistic regression code supporting multiple result categories, many levels of categorical modeling variables, good optimiza…☆35Updated 3 years ago
- Movielens collaborative filtering with Solr streaming expression☆11Updated 7 years ago
- NLP Utilities in Java☆43Updated last year
- Algorithms that build k-nearest neighbors graph (k-nn graph): Brute-force, NN-Descent,...☆34Updated 5 years ago
- Nutch 2.3.1 plugin for whitelisting/blacklisting specific HTML elements☆12Updated 2 years ago
- A chef cookbook for deploying spark☆30Updated 11 years ago
- Sparking Using Java8☆16Updated 9 years ago
- ☆24Updated 9 years ago
- System for mining Wikipedia Usage data to read our collective mind☆21Updated 9 years ago
- Storm / Solr Integration☆19Updated 7 months ago
- Code for the CIKM 2013 paper "Discovering Coherent Topics Using General Knowledge"☆11Updated 10 years ago
- Set of Hadoop, Spark and Storm based tools for web and customer analytic☆34Updated 3 years ago
- A collection of efficient utilities for a data scientist.☆40Updated 9 years ago
- A package full of linear algebra operators for Apache Spark MLlib's linalg package☆10Updated 9 years ago
- General Vectorization Lib for Machine Learning Tools☆31Updated 8 years ago
- Python and Scala APIs for enhanced Spark analytics☆11Updated 7 years ago
- Exploration Library in Java☆12Updated last year
- A Java framework to build semantics-aware autoencoder neural network from a knowledge-graph.☆13Updated 6 years ago
- MIT Big Data Challenge☆14Updated 10 years ago
- Clustering documents based on LSH☆14Updated 8 years ago