fcibecchini / smart-crawlerLinks
A smart distributed crawler that infers navigation models of structured websites, used to cluster pages based on their structure and extract data from them.
☆9Updated 4 years ago
Alternatives and similar repositories for smart-crawler
Users that are interested in smart-crawler are comparing it to the libraries listed below
Sorting:
- Python and Scala APIs for enhanced Spark analytics☆12Updated 8 years ago
- Text similarity based on Word2Vec vectors.☆11Updated 8 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- Dump mysql tables to s3, and parse them☆31Updated 10 years ago
- ☆16Updated 8 years ago
- phData Pulse application log aggregation and monitoring☆13Updated 5 years ago
- My dot files in one place - extensively edited over time. Your mileage may vary☆2Updated 9 years ago
- ☆11Updated 9 years ago
- Twitter sentiment analysis using Spark and Stanford CoreNLP and visualization using elasticsearch and kibana☆20Updated 7 years ago
- A bridge to Apache Atlas for provenance metadata created in course of using Apache NiFi☆15Updated 2 years ago
- Sentiment analysis framework developed by CERTH.☆22Updated 10 years ago
- ☆11Updated 9 years ago
- Simple FieldCache based query introspection Solr Search Component - solves the 'red sofa' problem☆12Updated 5 months ago
- Real-time query spark and visualise it as graph.☆24Updated 7 years ago
- Movielens collaborative filtering with Solr streaming expression☆11Updated 8 years ago
- The first Open Source document analysis platform☆65Updated 3 years ago
- Sample code for Splice Community☆10Updated 2 years ago
- Apache Pig plugin for Eclipse☆12Updated 8 years ago
- Deep learning certificate part 1☆10Updated 3 years ago
- Real time and offline time series analysis with Spark, Spark Streaming and Storm☆21Updated 4 years ago
- Visualization of result returning by Solr 6 graph query☆10Updated 9 years ago
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Updated 9 years ago
- Document Image Classification☆11Updated 7 years ago
- Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Stor…☆41Updated 2 years ago
- The classic movies redux with machine learning using TensorFlow and Keras.☆11Updated 6 years ago
- Provides the implementation of a topic detection framework developed for the MULTISENSOR project.☆9Updated 9 years ago
- Example application demonstrating how to integrate all of the components of Hortonworks DataFlow.☆14Updated 7 years ago
- from zero to storm cluster for realtime classification using sklearn☆12Updated 10 years ago
- Code and Data Samples for Big Data Warehousing.☆10Updated 9 years ago
- Code examples for Google Natural Language API.☆13Updated 5 years ago