kaqqao / nutch-element-selectorLinks
Nutch 2.3.1 plugin for whitelisting/blacklisting specific HTML elements
☆14Updated 3 years ago
Alternatives and similar repositories for nutch-element-selector
Users that are interested in nutch-element-selector are comparing it to the libraries listed below
Sorting:
- Distributed Web Crawler, Parser and Search Engine.☆10Updated 8 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- ☆49Updated 8 years ago
- A Storm based web crawler with Cassandra backend☆28Updated 11 years ago
- An academic open source and open data web crawler☆27Updated 7 years ago
- The first Open Source document analysis platform☆65Updated 3 years ago
- VoltDB Click Stream Processing Example.☆16Updated 7 years ago
- Storm / Solr Integration☆19Updated last year
- Sandbox for Apache nifi☆24Updated 3 years ago
- fuzzydb is a fuzzy matching database engine capable of providing human-like search results that make life much easier for users of websit…☆20Updated 2 years ago
- Focused Crawler for VT's CTRNet☆10Updated 12 years ago
- This is a set of ontologies used by different parts of the Open Semantic Framework. These ontologies should normally be loaded in OSF usi…☆14Updated 11 years ago
- A Nutch 2.2.1 plugin which allows users to shuffle off the responsibility for retrieving pages to a selenium hub/node spoke system. This …☆16Updated 8 years ago
- Scraper built with Scrapy.☆17Updated 9 months ago
- Secure REST service to index, search, retrieve and aggregate content from heterogeneous sources.☆20Updated 8 months ago
- Usage examples for Divolte collector☆17Updated 7 years ago
- Home of RDF2Go and RDFReactor☆13Updated 8 years ago
- Free, lightweight, asynchronous, streaming, scriptable, reverse proxy☆9Updated last year
- Linked Open Vocabularies (LOV) - scripts☆9Updated 8 years ago
- ☆55Updated 5 years ago
- ☆10Updated 7 years ago
- ***Warning*** Old Apache Flink Graph API: This repository is not in use anymore.☆15Updated 9 years ago
- Vizlinc☆15Updated 9 years ago
- MetaSync☆20Updated 9 years ago
- A generator for synthetic streams of financial transactions.☆16Updated 11 years ago
- Wikipedia River Plugin for elasticsearch (STOPPED)☆74Updated 2 years ago
- Code to index HDFS to Solr using MapReduce☆52Updated 6 years ago
- Big GeoSpatial Data Points Visualization Tool☆19Updated 9 years ago
- Common web archive utility code.☆55Updated 2 weeks ago
- iServe is what we refer to as service warehouse which unifies service publication, analysis, and discovery through the use of lightweigh…☆24Updated 9 years ago