skalmadka / web-crawler
Distributed Web Crawler, Parser and Search Engine.
☆10Updated 8 years ago
Alternatives and similar repositories for web-crawler:
Users that are interested in web-crawler are comparing it to the libraries listed below
- Sparking Using Java8☆17Updated 10 years ago
- A chef cookbook for deploying spark☆30Updated 12 years ago
- Exploration Library in Java☆12Updated last year
- NLP Utilities in Java☆43Updated 2 years ago
- word2vec-java☆7Updated 6 months ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- Experiments with distributed matrix factorization. Presented at DataWorks Summit 2017, München.☆10Updated 7 years ago
- Examples for Fast Data Processing with Spark☆59Updated 11 years ago
- Templates for projects based on top of H2O.☆38Updated last month
- Focused Crawler for VT's CTRNet☆10Updated 11 years ago
- Code for the CIKM 2013 paper "Discovering Coherent Topics Using General Knowledge"☆11Updated 10 years ago
- A collection of efficient utilities for a data scientist.☆41Updated 9 years ago
- Tweet Analysis with Spark☆15Updated 7 years ago
- A Real-Time Analytical Processing (RTAP) example using Spark/Shark☆51Updated 11 years ago
- Set of real time stream processing algorithms that can be used by big data streaming platform☆72Updated 4 years ago
- ☆24Updated 10 years ago
- Storm / Solr Integration☆19Updated last year
- Code for KDD 2014 paper "Mining Topics in Documents: Standing on the Shoulders of Big Data"☆21Updated 9 years ago
- PredictionIO word2vec engine template (Scala-based parallelized engine)☆12Updated 9 years ago
- Nutch 2.3.1 plugin for whitelisting/blacklisting specific HTML elements☆14Updated 3 years ago
- Distributed optimization framework with parameter server☆23Updated 9 years ago
- View Zookeeper znode tree in a browser☆26Updated 9 years ago
- Example code for building your own MemSQL Streamliner Pipelines☆23Updated 8 years ago
- scalding powered machine learning☆109Updated 10 years ago
- 阅读论文备份☆17Updated 8 years ago
- Set of Hadoop, Spark and Storm based tools for web and customer analytic☆34Updated 3 years ago
- General Vectorization Lib for Machine Learning Tools☆31Updated 8 years ago
- The code for the in memory data pipeline that was presented at Berlin Buzzwords 2015.☆10Updated 9 years ago
- Invoke Pandas plotting by piping in SQL output via PSQL (Can be used with Postgres or Greenplum or any SQL engine).☆16Updated 10 years ago
- DEPRECATED! Use https://github.com/h2oai/sparkling-water repository! H2O and Spark interoperability based on Tachyon.☆44Updated 10 years ago