kaqqao / nutch-element-selector
Nutch 2.3.1 plugin for whitelisting/blacklisting specific HTML elements
☆14Updated 3 years ago
Alternatives and similar repositories for nutch-element-selector:
Users that are interested in nutch-element-selector are comparing it to the libraries listed below
- This repository is outdated and will be discontinued. For latest code and information check: http://github.com/gpgmail/GPGMail☆54Updated 7 years ago
- Distributed Web Crawler, Parser and Search Engine.☆10Updated 8 years ago
- Perl CPAN module Makefile::GraphViz - Draw building flowcharts from Makefiles using GraphViz☆15Updated 10 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- Chambua is an open-source semantic tagging application that analyses text and extracts names of people, places (& geocodes them), organis…☆33Updated 3 years ago
- fuzzydb is a fuzzy matching database engine capable of providing human-like search results that make life much easier for users of websit…☆20Updated 2 years ago
- A Nutch 2.2.1 plugin which allows users to shuffle off the responsibility for retrieving pages to a selenium hub/node spoke system. This …☆16Updated 8 years ago
- HTTP Shell is a CLI tool based on the Kui framework that provides developers a modern alternative to http clients for interacting with AP…☆12Updated 4 years ago
- ☆10Updated 7 years ago
- Scraper built with Scrapy.☆17Updated 8 months ago
- ☆14Updated 10 years ago
- Vizlinc☆14Updated 9 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- ☆49Updated 8 years ago
- ☆13Updated 9 years ago
- A collection of best practices that we have learnt so far☆12Updated 11 years ago
- Hadoop MapReduce over Hive based implementation of attributed network pattern matching.☆40Updated 10 years ago
- Code and templates required to build the DARPA open catalog.☆17Updated 9 years ago
- Sandbox for Apache nifi☆24Updated 3 years ago
- Big GeoSpatial Data Points Visualization Tool☆19Updated 9 years ago
- ☆25Updated 9 years ago
- Tool to cleanse and semantify datasets from CKAN repositories. Based on OpenRefine.☆23Updated 9 years ago
- Elasticsearch REPL built on top of Jest☆23Updated 9 years ago
- Open Semantic Search Appliance (VM)☆12Updated 4 years ago
- Terminus DB Schemas - Formal descriptions and documentation of all the internal data structures used by Terminus DB☆10Updated 5 years ago
- A conda-smithy repository for scikit-learn.☆7Updated 8 years ago
- History DB is a trully scalable (hundreds of millions updates per day) distributed archive system with per user and per day activity stat…☆31Updated 11 years ago
- Secure REST service to index, search, retrieve and aggregate content from heterogeneous sources.☆20Updated 7 months ago
- Very fast and noisy TCP port scanner☆9Updated 8 years ago
- Models and serializers for ontologies and related artifacts backed by 4store☆19Updated last week