ept / warc-hadoop
WARC (Web Archive) Input and Output Formats for Hadoop
☆35Updated 10 years ago
Alternatives and similar repositories for warc-hadoop:
Users that are interested in warc-hadoop are comparing it to the libraries listed below
- Scala utilities for teaching computational linguistics and prototyping algorithms.☆42Updated 12 years ago
- Apache OpenNLP Sandbox☆42Updated this week
- RDF-Centric Map/Reduce Framework and Freebase data conversion tool☆148Updated 3 years ago
- A toolkit that wraps various natural language processing implementations behind a common interface.☆101Updated 7 years ago
- Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets☆92Updated 9 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆33Updated last year
- Use Cascading Taps and Scalding DSL with Spark☆49Updated 8 years ago
- Using deep learning to POS tag sentences using scala + DL4J☆37Updated 9 years ago
- A Scala wrapper for CoreNLP☆40Updated 9 years ago
- ☆49Updated 8 years ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆281Updated 6 years ago
- Puck is a lightning-fast parser for natural languages using GPUs☆249Updated 10 years ago
- Templates for projects based on top of H2O.☆37Updated this week
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 5 years ago
- ☆92Updated 9 years ago
- Educational Examle of a custom Lucene Query & Scorer☆48Updated 5 years ago
- NLP tools developed by Emory University.☆60Updated 8 years ago
- Java implementation of the TextRank algorithm by Mihalcea, et al.☆75Updated 4 years ago
- A new object-graph-wrapper for the Tinkerpop 3 graph stack.☆40Updated 3 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆213Updated 2 years ago
- CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop☆38Updated 3 months ago
- A java library for stored queries☆16Updated last year
- ☆41Updated 7 years ago
- A package full of linear algebra operators for Apache Spark MLlib's linalg package☆10Updated 9 years ago
- Day 20 demo application☆50Updated 11 years ago
- Mirror of Apache Stanbol (incubating)☆112Updated last year
- Alenka JDBC is a library for accessing and manipulating data with the open-source GPU database Alenka.☆19Updated 10 years ago
- NER tagger for English, Spanish, Dutch, Italian and German and French.☆35Updated 9 years ago
- The Cognitive Foundry is an open-source Java library for building intelligent systems using machine learning☆133Updated 3 years ago
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 8 years ago