kaqqao / nutch-element-selector
Nutch 2.3.1 plugin for whitelisting/blacklisting specific HTML elements
☆14Updated 2 years ago
Alternatives and similar repositories for nutch-element-selector:
Users that are interested in nutch-element-selector are comparing it to the libraries listed below
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- ☆13Updated 9 years ago
- Focused Crawler for VT's CTRNet☆10Updated 11 years ago
- Vizlinc☆14Updated 9 years ago
- An academic open source and open data web crawler☆27Updated 7 years ago
- Scraper built with Scrapy.☆15Updated 7 months ago
- ☆10Updated 7 years ago
- This is a set of ontologies used by different parts of the Open Semantic Framework. These ontologies should normally be loaded in OSF usi…☆14Updated 11 years ago
- The first Open Source document analysis platform☆65Updated 3 years ago
- ☆49Updated 8 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago
- Distributed Web Crawler, Parser and Search Engine.☆10Updated 8 years ago
- A Nutch 2.2.1 plugin which allows users to shuffle off the responsibility for retrieving pages to a selenium hub/node spoke system. This …☆16Updated 8 years ago
- RDFSpace constructs a vector space from any RDF dataset which can be used for computing similarities between resources in that dataset.☆39Updated 11 years ago
- Contains the implementation of algorithms that estimate the geographic location of media content based on their content and metadata. It …☆15Updated 8 years ago
- HTTP Shell is a CLI tool based on the Kui framework that provides developers a modern alternative to http clients for interacting with AP…☆12Updated 4 years ago
- Sandbox for Apache nifi☆24Updated 3 years ago
- Big GeoSpatial Data Points Visualization Tool☆19Updated 8 years ago
- Tool to cleanse and semantify datasets from CKAN repositories. Based on OpenRefine.☆23Updated 9 years ago
- Open Semantic Search Appliance (VM)☆12Updated 4 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- For interacting with nutch via Python☆24Updated last month
- ☆20Updated 8 years ago
- Very fast and noisy TCP port scanner☆9Updated 8 years ago
- Models and serializers for ontologies and related artifacts backed by 4store☆19Updated last week
- Using social media to steer web archiving and curation.☆15Updated 9 years ago
- RDFpro☆12Updated 3 years ago
- KnowledgeStore☆20Updated 7 years ago
- Fcrepo4 webapp plus optional fcrepo dependencies☆13Updated 4 years ago
- fuzzydb is a fuzzy matching database engine capable of providing human-like search results that make life much easier for users of websit…☆19Updated last year