fcibecchini / smart-crawler
A smart distributed crawler that infers navigation models of structured websites, used to cluster pages based on their structure and extract data from them.
☆9Updated 4 years ago
Alternatives and similar repositories for smart-crawler:
Users that are interested in smart-crawler are comparing it to the libraries listed below
- Code and Data Samples for Big Data Warehousing.☆10Updated 9 years ago
- Python and Scala APIs for enhanced Spark analytics☆12Updated 8 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- ☆16Updated 8 years ago
- ☆11Updated 9 years ago
- Text similarity based on Word2Vec vectors.☆11Updated 8 years ago
- Twitter sentiment analysis using Spark and Stanford CoreNLP and visualization using elasticsearch and kibana☆20Updated 7 years ago
- Simple FieldCache based query introspection Solr Search Component - solves the 'red sofa' problem☆12Updated 3 months ago
- Exploration Library in Java☆12Updated last year
- Movielens collaborative filtering with Solr streaming expression☆11Updated 8 years ago
- The first Open Source document analysis platform☆65Updated 3 years ago
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Updated 9 years ago
- Java library for Concrete, a data serialization format for NLP☆6Updated 5 years ago
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- ☆37Updated 6 years ago
- Real-time query spark and visualise it as graph.☆24Updated 7 years ago
- A subgroup discovery tool that can use ontological domain knowledge (RDF graphs) in the learning process. Subgroup descriptions contain t…☆12Updated 7 years ago
- Sample code for Splice Community☆10Updated 2 years ago
- VoltDB Click Stream Processing Example.☆16Updated 7 years ago
- Provided Guidance on Creating End to End Solutions for Common SILK Use Cases☆13Updated 9 years ago
- "BI Glue" Business Intelligence middleware library for aggregation of metrics/KPI from any source and custom reporting for humans or othe…☆10Updated 10 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 12 years ago
- An Apache Lucene TokenFilter that uses a word2vec vectors for term expansion.☆24Updated 11 years ago
- Distributed Dexecutor Using Ignite☆10Updated 7 years ago
- Dump mysql tables to s3, and parse them☆31Updated 10 years ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆17Updated 2 years ago
- Deep neural parser for database query☆18Updated 2 years ago
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆26Updated 2 months ago
- NiFi Bundle for FIX Protocol☆16Updated 7 years ago
- A framework for training and evaluating AI models on a variety of openly available dialogue datasets.☆9Updated 4 years ago