Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop
☆85Jun 8, 2013Updated 12 years ago
Alternatives and similar repositories for wikihadoop
Users that are interested in wikihadoop are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for aggregating wikipedia traffic statistics☆36May 25, 2013Updated 12 years ago
- Codec for Hadoop adding OpenPGP encryption using Bouncy Castle☆17Aug 18, 2011Updated 14 years ago
- Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. The primary goal of Bigtop is to build a …☆50Jul 4, 2011Updated 14 years ago
- A JRuby DSL for Cascading☆41Sep 23, 2015Updated 10 years ago
- playing around with the common crawl dataset☆70Aug 18, 2012Updated 13 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A set of examples and utilities for using Pig with Cassandra. For the latest jar release, check the Downloads link.☆84Aug 21, 2014Updated 11 years ago
- Examples of use of pig scripting languages capabilities☆39Aug 1, 2016Updated 9 years ago
- ☆17Sep 7, 2011Updated 14 years ago
- AmIUnique extension for Chrome☆10Apr 3, 2019Updated 6 years ago
- Implementation of Tyler Neylon's Locality-Specific Hash based on simplex tesselations☆28Oct 15, 2011Updated 14 years ago
- A KEDA external scaler for the Durable Task Azure Storage backend.☆10Updated this week
- A Hadoop toolkit for web-scale information retrieval research☆85Dec 12, 2014Updated 11 years ago
- ☆25Feb 23, 2012Updated 14 years ago
- ruby client for Hadoop HBase☆58Mar 8, 2009Updated 17 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Ruby Enumerator plumbing (ala Unix pipes)☆25Apr 18, 2013Updated 12 years ago
- SQL Windowing Functions for Hadoop☆65Jun 20, 2022Updated 3 years ago
- useful JVM classes for the mrjob hadoop streaming framework☆31Jun 20, 2013Updated 12 years ago
- For ruby; a simple way of doing heavy work in a background thread in and when you really need the object it will block until it is done☆23Apr 8, 2011Updated 14 years ago
- zeromq input and output modules for rsyslog☆98Feb 24, 2012Updated 14 years ago
- HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processi…☆612Updated this week
- http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36266.pdf☆14Apr 25, 2012Updated 13 years ago
- SimMetrics is a Similarity Metric Library, based on previous work by http://sourceforge.net/projects/simmetrics/☆11Aug 25, 2016Updated 9 years ago
- Cloud9 is a Hadoop toolkit for working with big data☆236Dec 15, 2015Updated 10 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A beanstalkd (distributed task queue) clone in clojure☆20Dec 11, 2011Updated 14 years ago
- C-style enums for ruby☆14Jun 12, 2011Updated 14 years ago
- GoldenOrb is an open-source implementation of Pregel, Google's graph processing framework☆294Jun 29, 2022Updated 3 years ago
- distributed latent dirichlet allocation☆29Dec 15, 2011Updated 14 years ago
- Design of a specification for the automation of infrastructure deployments☆24Apr 6, 2022Updated 3 years ago
- Bulk loading for elastic search☆187Dec 16, 2023Updated 2 years ago
- Ruby gem for querying Apache Hive☆98Apr 29, 2021Updated 4 years ago
- Example code for "Web-Scale Computer Vision using MapReduce for Multimedia Data Mining"☆48Aug 2, 2010Updated 15 years ago
- ⭐️⭐️⭐️⭐️⭐️ A 5-star rating widget implemented in JS and CSS☆23Oct 15, 2019Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆12Oct 25, 2015Updated 10 years ago
- Yahoo!'s topic modelling framework using Latent Dirichlet Allocation☆337Sep 21, 2011Updated 14 years ago
- Crime Doesn't Climb in San Francisco☆100Jan 29, 2014Updated 12 years ago
- Spell and pronounce words with a neural network☆10Feb 13, 2017Updated 9 years ago
- Break up your haproxy configs and join them together☆57Sep 7, 2012Updated 13 years ago
- An extension for eXist-db that allows the reading and writing of MARC into and out from the database☆11Mar 6, 2016Updated 10 years ago
- Talk: minimal standalone keynote software using HTML5 rendering and Markdown editing☆13Oct 28, 2016Updated 9 years ago