fcibecchini / smart-crawler
A smart distributed crawler that infers navigation models of structured websites, used to cluster pages based on their structure and extract data from them.
☆8Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for smart-crawler
- Python and Scala APIs for enhanced Spark analytics☆11Updated 7 years ago
- Code and Data Samples for Big Data Warehousing.☆10Updated 9 years ago
- Movielens collaborative filtering with Solr streaming expression☆11Updated 8 years ago
- ☆16Updated 8 years ago
- Text similarity based on Word2Vec vectors.☆10Updated 7 years ago
- Provides the implementation of a topic detection framework developed for the MULTISENSOR project.☆9Updated 8 years ago
- A subgroup discovery tool that can use ontological domain knowledge (RDF graphs) in the learning process. Subgroup descriptions contain t…☆13Updated 7 years ago
- from zero to storm cluster for realtime classification using sklearn☆12Updated 10 years ago
- Collects multimedia content shared through social networks.☆19Updated 9 years ago
- Named Entity Extraction on Twitter Stream using Apache Spark Streaming and Stanford CoreNLP☆15Updated 8 years ago
- Notes from Stanford NLP class☆24Updated 11 years ago
- A Java framework to build semantics-aware autoencoder neural network from a knowledge-graph.☆13Updated 7 years ago
- This is a REST Server endpoint built using Flask and Python.☆24Updated 2 years ago
- ☆20Updated 8 years ago
- Big GeoSpatial Data Points Visualization Tool☆19Updated 8 years ago
- An Apache Lucene TokenFilter that uses a word2vec vectors for term expansion.☆24Updated 10 years ago
- Merck challenge at Kaggle☆10Updated 10 years ago
- Predicting sales with Pandas☆15Updated 9 years ago
- Query Expansion using word2vec☆11Updated 5 years ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Simple FieldCache based query introspection Solr Search Component - solves the 'red sofa' problem☆12Updated 3 years ago
- Machine Learning Open Source Software☆23Updated 6 years ago
- A set of tools for performing Labeled Latent Dirichlet Allocation on textual datasets, with an emphasis on Twitter profiles. Contains too…☆42Updated 2 years ago
- iCQA - Intelligent Community Question Answering Framework☆32Updated 8 years ago
- implement some outlier detection algorithms☆11Updated 9 years ago