kaqqao / nutch-element-selector
Nutch 2.3.1 plugin for whitelisting/blacklisting specific HTML elements
☆14Updated 2 years ago
Alternatives and similar repositories for nutch-element-selector:
Users that are interested in nutch-element-selector are comparing it to the libraries listed below
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- Distributed Web Crawler, Parser and Search Engine.☆10Updated 8 years ago
- ☆13Updated 9 years ago
- Web Tables Automatic Property Mapping☆7Updated 5 years ago
- Bicycle Incident reporting☆13Updated 2 years ago
- Vizlinc☆14Updated 9 years ago
- Digital file signing and signature verfication utility☆17Updated 9 years ago
- HTTP Shell is a CLI tool based on the Kui framework that provides developers a modern alternative to http clients for interacting with AP…☆12Updated 4 years ago
- Elasticsearch REPL built on top of Jest☆23Updated 9 years ago
- ☆49Updated 7 years ago
- RDFSpace constructs a vector space from any RDF dataset which can be used for computing similarities between resources in that dataset.☆39Updated 11 years ago
- Open Source Social Media Monitoring And Engagement System Core/API☆36Updated 10 years ago
- An academic open source and open data web crawler☆27Updated 7 years ago
- Sandbox for Apache nifi☆24Updated 3 years ago
- Tool to cleanse and semantify datasets from CKAN repositories. Based on OpenRefine.☆23Updated 9 years ago
- This repository is outdated and will be discontinued. For latest code and information check: http://github.com/gpgmail/GPGMail☆54Updated 6 years ago
- Masques is a distributed social network.☆36Updated 8 years ago
- Build simple social graphs for GitHub☆15Updated 9 years ago
- Storm / Solr Integration☆19Updated last year
- OGDL for C☆17Updated 7 years ago
- Real time visualization of tweets.☆11Updated 10 years ago
- A Storm based web crawler with Cassandra backend☆28Updated 11 years ago
- Code to index HDFS to Solr using MapReduce☆52Updated 6 years ago
- Chambua is an open-source semantic tagging application that analyses text and extracts names of people, places (& geocodes them), organis…☆33Updated 3 years ago
- Treat curl configuration files as curlrc subcommands.☆11Updated 3 years ago
- The first Open Source document analysis platform☆65Updated 3 years ago
- A crawler, indexer, and query interface all in Python with distributed processing via Pyro4.☆23Updated 12 years ago
- Provided Guidance on Creating End to End Solutions for Common SILK Use Cases☆13Updated 9 years ago
- Java port of TLSH (Trend Micro Locality Sensitive Hash)☆20Updated 3 years ago
- mirror just the newest versions of things in the CPAN☆26Updated 8 months ago