Smerity / cc-warc-examplesLinks
CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop
☆57Updated 4 years ago
Alternatives and similar repositories for cc-warc-examples
Users that are interested in cc-warc-examples are comparing it to the libraries listed below
Sorting:
- Common web archive utility code.☆56Updated 2 months ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆283Updated 7 years ago
- Mirror of Apache Stanbol (incubating)☆114Updated last year
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 5 years ago
- Warcbase is an open-source platform for managing analyzing web archives☆162Updated 7 years ago
- Solrstrap is a Query-Result interface for Solr written in JavaScript, HTML and CSS☆87Updated 8 years ago
- A text tagger based on Lucene / Solr, using FST technology☆177Updated last year
- CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop☆37Updated 10 months ago
- Json Wikipedia, contains code to convert the Wikipedia xml dump into a json/avro dump☆254Updated last year
- RDF-Centric Map/Reduce Framework and Freebase data conversion tool☆148Updated 3 years ago
- Approve or reject statements from third-party datasets☆146Updated 7 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34Updated 2 years ago
- an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)☆54Updated 7 years ago
- SKOS Support for Apache Lucene and Solr☆56Updated 4 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆246Updated 3 weeks ago
- RDF store on a cloud-based architecture (previously on https://code.google.com/p/cumulusrdf)☆31Updated 9 years ago
- Solr AutoComplete implementation☆59Updated 8 years ago
- UIMA-based text classification framework built on top of DKPro Core and DKPro Lab.☆35Updated 2 years ago
- NEWS: JATE2.0 Beta.11 Released, see details below.☆82Updated 2 years ago
- The WikiBrain Java library enables researchers and developers to incorporate state-of-the-art Wikipedia-based algorithms and technologies…☆95Updated 7 years ago
- The linked open dataset described at http://datahub.io/dataset/vu-wordnet, and the tools used to create it☆25Updated 5 years ago
- Search a single field with different query time analyzers in Solr☆25Updated 5 years ago
- Simple search results with Solr and EmberJS☆58Updated 6 years ago
- Dice Solr Plugins from Simon Hughes Dice.com☆88Updated 4 years ago
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 9 years ago
- A Query Autofiltering SearchComponent for Solr that can translate free-text queries into structured queries using index metadata☆27Updated 7 years ago
- An RDF plugin for Solr☆115Updated 8 months ago
- ☆185Updated 6 years ago
- English Dependency Relationship Extractor☆85Updated 10 months ago