fcibecchini / smart-crawlerLinks
A smart distributed crawler that infers navigation models of structured websites, used to cluster pages based on their structure and extract data from them.
☆9Updated 4 years ago
Alternatives and similar repositories for smart-crawler
Users that are interested in smart-crawler are comparing it to the libraries listed below
Sorting:
- Python and Scala APIs for enhanced Spark analytics☆12Updated 8 years ago
- Code examples for Google Natural Language API.☆13Updated 5 years ago
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- This repository contains the DFKI Product Corpus, a dataset of 174 documents annotated for product and company named entities, and the re…☆12Updated 9 months ago
- Short Text Similarity as described in https://dl.acm.org/citation.cfm?id=2806475☆16Updated 6 years ago
- Deep neural parser for database query☆18Updated 2 years ago
- Provides the implementation of a topic detection framework developed for the MULTISENSOR project.☆9Updated 9 years ago
- ☆16Updated 8 years ago
- Text similarity based on Word2Vec vectors.☆11Updated 8 years ago
- S2RDF (SPARQL on Spark for RDF) is a SPARQL query processor for Hadoop based on Spark SQL. It uses the relational interface of Spark for …☆13Updated 7 years ago
- Opinion miner based of Machine Learning that can be trained on a corpus of KAF/NAF files☆9Updated 6 years ago
- Open Collaborative AI Driven Parser builder for Web Scraping, Data Extraction and Crawling,Knowledge Graph☆1Updated 5 months ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 12 years ago
- ☆37Updated 6 years ago
- Tweet Analysis with Spark☆15Updated 7 years ago
- A subgroup discovery tool that can use ontological domain knowledge (RDF graphs) in the learning process. Subgroup descriptions contain t…☆12Updated 7 years ago
- The classic movies redux with machine learning using TensorFlow and Keras.☆11Updated 6 years ago
- Detecting Trends in Job Advertisements☆20Updated 6 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Regularized latent variable mixed membership modeling☆13Updated 11 years ago
- Movielens collaborative filtering with Solr streaming expression☆11Updated 8 years ago
- This is a REST Server endpoint built using Flask and Python.☆24Updated 2 years ago
- A workflow system for Natural Language Processing.☆21Updated 5 years ago
- ☆20Updated 8 years ago
- Neural Elastic Inference and Search☆19Updated 5 years ago
- Code for the paper: "Cross-domain Semantic Parsing via Paraphrasing" - EMNLP 2017☆14Updated 6 years ago
- Code and Data Samples for Big Data Warehousing.☆10Updated 9 years ago
- Mention-anomaly-based event detection and tracking in Twitter☆17Updated 8 years ago
- Machine Learning for Cascading☆82Updated 10 years ago
- A set of tools for performing Labeled Latent Dirichlet Allocation on textual datasets, with an emphasis on Twitter profiles. Contains too…☆42Updated 3 years ago