skalmadka / web-crawler
Distributed Web Crawler, Parser and Search Engine.
☆10Updated 8 years ago
Alternatives and similar repositories for web-crawler:
Users that are interested in web-crawler are comparing it to the libraries listed below
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- Exploration Library in Java☆12Updated last year
- Templates for projects based on top of H2O.☆38Updated last month
- scalding powered machine learning☆109Updated 10 years ago
- Experiments with distributed matrix factorization. Presented at DataWorks Summit 2017, München.☆10Updated 7 years ago
- VoltDB Click Stream Processing Example.☆16Updated 7 years ago
- Simple FieldCache based query introspection Solr Search Component - solves the 'red sofa' problem☆12Updated 3 months ago
- word2vec-java☆7Updated 7 months ago
- ***Warning*** Old Apache Flink Graph API: This repository is not in use anymore.☆15Updated 9 years ago
- UberSocialNet—applying the Lambda Architecture☆30Updated 11 years ago
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Updated 9 years ago
- Implementation of the Chinese Whispers graph clustering algorithm☆8Updated 7 years ago
- Focused Crawler for VT's CTRNet☆10Updated 11 years ago
- Sparse feature extraction with Spark☆30Updated 6 years ago
- Real-time query spark and visualise it as graph.☆24Updated 7 years ago
- A collection of efficient utilities for a data scientist.☆41Updated 10 years ago
- A chef cookbook for deploying spark☆30Updated 12 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆20Updated 8 years ago
- Greylock is an embedded search engine which is aimed at index size and performace☆12Updated 8 years ago
- Tweet Analysis with Spark☆15Updated 7 years ago
- presto-redis is an experimental sql layer for redis☆18Updated 10 years ago
- Easy distributed TensorFlow on Hadoop (moved to: hops-tensorflow)☆9Updated 8 years ago
- Example code for building your own MemSQL Streamliner Pipelines☆23Updated 8 years ago
- Invoke Pandas plotting by piping in SQL output via PSQL (Can be used with Postgres or Greenplum or any SQL engine).☆16Updated 10 years ago
- Includes Code for Inference and Evaluation of Topic Models for Selectional Preferences☆16Updated 2 years ago
- Named Entity Extraction on Twitter Stream using Apache Spark Streaming and Stanford CoreNLP☆15Updated 8 years ago
- iCQA - Intelligent Community Question Answering Framework☆31Updated 8 years ago
- Sparking Using Java8☆17Updated 10 years ago
- Deep learning certificate part 1☆10Updated 3 years ago
- Sample custom Nifi processor to process tcpdump☆18Updated 9 years ago