whym / wikihadoop
Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop
☆85Updated 11 years ago
Alternatives and similar repositories for wikihadoop:
Users that are interested in wikihadoop are comparing it to the libraries listed below
- Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.