Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop
☆85Jun 8, 2013Updated 12 years ago
Alternatives and similar repositories for wikihadoop
Users that are interested in wikihadoop are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Codec for Hadoop adding OpenPGP encryption using Bouncy Castle☆17Aug 18, 2011Updated 14 years ago
- A JRuby DSL for Cascading☆41Sep 23, 2015Updated 10 years ago
- playing around with the common crawl dataset☆70Aug 18, 2012Updated 13 years ago
- A set of examples and utilities for using Pig with Cassandra. For the latest jar release, check the Downloads link.☆84Aug 21, 2014Updated 11 years ago
- ☆23Aug 2, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Examples of use of pig scripting languages capabilities☆39Aug 1, 2016Updated 9 years ago
- Pikes is a Knowledge Extraction Suite☆23Nov 14, 2023Updated 2 years ago
- A KEDA external scaler for the Durable Task Azure Storage backend.☆10May 2, 2026Updated last week
- A Hadoop toolkit for web-scale information retrieval research☆86Dec 12, 2014Updated 11 years ago
- ☆25Feb 23, 2012Updated 14 years ago
- ruby client for Hadoop HBase☆58Mar 8, 2009Updated 17 years ago
- Ruby Enumerator plumbing (ala Unix pipes)☆25Apr 18, 2013Updated 13 years ago
- SQL Windowing Functions for Hadoop☆65Jun 20, 2022Updated 3 years ago
- useful JVM classes for the mrjob hadoop streaming framework☆31Jun 20, 2013Updated 12 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- For ruby; a simple way of doing heavy work in a background thread in and when you really need the object it will block until it is done☆23Apr 8, 2011Updated 15 years ago
- zeromq input and output modules for rsyslog☆98Feb 24, 2012Updated 14 years ago
- HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processi…☆611Updated this week
- http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36266.pdf☆14Apr 25, 2012Updated 14 years ago
- Cloud9 is a Hadoop toolkit for working with big data☆236Dec 15, 2015Updated 10 years ago
- A beanstalkd (distributed task queue) clone in clojure☆20Dec 11, 2011Updated 14 years ago
- Semantic Web database☆19Sep 1, 2022Updated 3 years ago
- C-style enums for ruby☆14Jun 12, 2011Updated 14 years ago
- GoldenOrb is an open-source implementation of Pregel, Google's graph processing framework☆293Jun 29, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- distributed latent dirichlet allocation☆29Dec 15, 2011Updated 14 years ago
- Design of a specification for the automation of infrastructure deployments☆24Apr 6, 2022Updated 4 years ago
- Bulk loading for elastic search☆187Dec 16, 2023Updated 2 years ago
- Ruby gem for querying Apache Hive☆98Apr 29, 2021Updated 5 years ago
- ☆11Feb 13, 2026Updated 2 months ago
- Example code for "Web-Scale Computer Vision using MapReduce for Multimedia Data Mining"☆48Aug 2, 2010Updated 15 years ago
- Ruby code to access Microsoft's Ngram data☆20Apr 12, 2012Updated 14 years ago
- ☆12Oct 25, 2015Updated 10 years ago
- Yahoo!'s topic modelling framework using Latent Dirichlet Allocation☆337Sep 21, 2011Updated 14 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Spell and pronounce words with a neural network☆10Feb 13, 2017Updated 9 years ago
- State-of-The-Art Unsupervised Part-Of-Speech Type-Level Tagger in 300 Lines of Clojure☆40Sep 15, 2010Updated 15 years ago
- Break up your haproxy configs and join them together☆57Sep 7, 2012Updated 13 years ago
- Greek Syntax - Query the Greek New Testament with XQuery, XPath, and Python in Jupyter Notebooks☆11Aug 11, 2021Updated 4 years ago
- An extension for eXist-db that allows the reading and writing of MARC into and out from the database☆11Mar 6, 2016Updated 10 years ago
- simple HTTP proxy based on tproxy☆28May 5, 2011Updated 15 years ago
- Ruby Linear Algebra Library☆108Jun 7, 2009Updated 16 years ago