fcibecchini / smart-crawler
A smart distributed crawler that infers navigation models of structured websites, used to cluster pages based on their structure and extract data from them.
☆8Updated 3 years ago
Related projects: ⓘ
- Python and Scala APIs for enhanced Spark analytics☆11Updated 7 years ago
- ☆16Updated 8 years ago
- Code and Data Samples for Big Data Warehousing.☆10Updated 9 years ago
- Text similarity based on Word2Vec vectors.☆10Updated 7 years ago
- phData Pulse application log aggregation and monitoring☆13Updated 4 years ago
- Named Entity Extraction on Twitter Stream using Apache Spark Streaming and Stanford CoreNLP☆15Updated 7 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Twitter sentiment analysis using Spark and Stanford CoreNLP and visualization using elasticsearch and kibana☆20Updated 6 years ago
- Short Text Similarity as described in https://dl.acm.org/citation.cfm?id=2806475☆16Updated 5 years ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆16Updated 2 years ago
- Neural Elastic Inference and Search☆19Updated 4 years ago
- KnowledgeStore☆20Updated 6 years ago
- from zero to storm cluster for realtime classification using sklearn☆12Updated 10 years ago
- ☆15Updated 6 years ago
- Real-time query spark and visualise it as graph.☆24Updated 6 years ago
- Provides the implementation of a topic detection framework developed for the MULTISENSOR project.☆9Updated 8 years ago
- System for mining Wikipedia Usage data to read our collective mind☆21Updated 9 years ago
- Deep learning certificate part 1☆10Updated 2 years ago
- A Java framework to build semantics-aware autoencoder neural network from a knowledge-graph.☆13Updated 6 years ago
- Sample code for Splice Community☆10Updated 2 years ago
- Simple FieldCache based query introspection Solr Search Component - solves the 'red sofa' problem☆12Updated 3 years ago
- Library for building reproducible data pipelines to support experimentation☆20Updated 8 years ago
- An Apache Lucene TokenFilter that uses a word2vec vectors for term expansion.☆24Updated 10 years ago
- Examples of using SparklingPandas and Pandas with PySpark☆15Updated 9 years ago
- VoltDB Click Stream Processing Example.☆16Updated 6 years ago
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Updated 8 years ago
- Collects multimedia content shared through social networks.☆19Updated 9 years ago
- Exploration Library in Java☆12Updated last year
- Dump mysql tables to s3, and parse them☆31Updated 9 years ago
- Movielens collaborative filtering with Solr streaming expression☆11Updated 7 years ago