Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop
☆85Jun 8, 2013Updated 12 years ago
Alternatives and similar repositories for wikihadoop
Users that are interested in wikihadoop are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for aggregating wikipedia traffic statistics☆36May 25, 2013Updated 12 years ago
- Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. The primary goal of Bigtop is to build a …☆50Jul 4, 2011Updated 14 years ago
- A JRuby DSL for Cascading☆41Sep 23, 2015Updated 10 years ago
- playing around with the common crawl dataset☆70Aug 18, 2012Updated 13 years ago
- ☆23Aug 2, 2021Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Examples of use of pig scripting languages capabilities☆39Aug 1, 2016Updated 9 years ago
- A simple column reader for ActiveRecord☆13Nov 1, 2011Updated 14 years ago
- Pikes is a Knowledge Extraction Suite☆23Nov 14, 2023Updated 2 years ago
- Implementation of Tyler Neylon's Locality-Specific Hash based on simplex tesselations☆28Oct 15, 2011Updated 14 years ago
- A Hadoop toolkit for web-scale information retrieval research☆85Dec 12, 2014Updated 11 years ago
- Clojure wrapper for LDA topic modeling in MALLET☆33Sep 6, 2011Updated 14 years ago
- ruby client for Hadoop HBase☆58Mar 8, 2009Updated 17 years ago
- SQL Windowing Functions for Hadoop☆65Jun 20, 2022Updated 3 years ago
- useful JVM classes for the mrjob hadoop streaming framework☆31Jun 20, 2013Updated 12 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- zeromq input and output modules for rsyslog☆98Feb 24, 2012Updated 14 years ago
- http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36266.pdf☆14Apr 25, 2012Updated 13 years ago
- ☆26Mar 19, 2012Updated 14 years ago
- Cloud9 is a Hadoop toolkit for working with big data☆236Dec 15, 2015Updated 10 years ago
- JVMTI agent which calls mlockall and setuids down to a target user upon initialization☆21Sep 13, 2011Updated 14 years ago
- Linked Data explorer and SPARQL endpoint☆23Dec 15, 2021Updated 4 years ago
- A beanstalkd (distributed task queue) clone in clojure☆20Dec 11, 2011Updated 14 years ago
- Semantic Web database☆19Sep 1, 2022Updated 3 years ago
- GoldenOrb is an open-source implementation of Pregel, Google's graph processing framework☆294Jun 29, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- distributed latent dirichlet allocation☆29Dec 15, 2011Updated 14 years ago
- Design of a specification for the automation of infrastructure deployments☆24Apr 6, 2022Updated 4 years ago
- Bulk loading for elastic search☆187Dec 16, 2023Updated 2 years ago
- vCat Java code☆11Apr 6, 2026Updated 2 weeks ago
- Example code for "Web-Scale Computer Vision using MapReduce for Multimedia Data Mining"☆48Aug 2, 2010Updated 15 years ago
- ☆11Feb 13, 2026Updated 2 months ago
- A Perl Semantic Web Framework☆19Jan 23, 2025Updated last year
- Ruby code to access Microsoft's Ngram data☆20Apr 12, 2012Updated 14 years ago
- ☆12Oct 25, 2015Updated 10 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Yahoo!'s topic modelling framework using Latent Dirichlet Allocation☆337Sep 21, 2011Updated 14 years ago
- Crime Doesn't Climb in San Francisco☆100Jan 29, 2014Updated 12 years ago
- State-of-The-Art Unsupervised Part-Of-Speech Type-Level Tagger in 300 Lines of Clojure☆40Sep 15, 2010Updated 15 years ago
- Greek Syntax - Query the Greek New Testament with XQuery, XPath, and Python in Jupyter Notebooks☆11Aug 11, 2021Updated 4 years ago
- mruby-r: Use (m)Ruby for returning data to R☆26Aug 8, 2015Updated 10 years ago
- An implementation of Protocol Buffers for Ruby.☆58Feb 20, 2013Updated 13 years ago
- VMWare Cli for ESX, ESXi and Converter☆18Apr 22, 2015Updated 10 years ago