Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop
☆85Jun 8, 2013Updated 13 years ago
Alternatives and similar repositories for wikihadoop
Users that are interested in wikihadoop are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for aggregating wikipedia traffic statistics☆36May 25, 2013Updated 13 years ago
- A JRuby DSL for Cascading☆41Sep 23, 2015Updated 10 years ago
- playing around with the common crawl dataset☆70Aug 18, 2012Updated 13 years ago
- Wikipedia citation tool for Google Books, New York Times, ISBN, DOI and more☆22Oct 29, 2016Updated 9 years ago
- ☆23Aug 2, 2021Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Examples of use of pig scripting languages capabilities☆39Aug 1, 2016Updated 9 years ago
- ☆17Sep 7, 2011Updated 14 years ago
- S3 log bucket parser app for Django☆15Sep 12, 2011Updated 14 years ago
- Implementation of Tyler Neylon's Locality-Specific Hash based on simplex tesselations☆28Oct 15, 2011Updated 14 years ago
- A KEDA external scaler for the Durable Task Azure Storage backend.☆10Updated this week
- Clojure wrapper for LDA topic modeling in MALLET☆33Sep 6, 2011Updated 14 years ago
- useful JVM classes for the mrjob hadoop streaming framework☆31Jun 20, 2013Updated 12 years ago
- http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36266.pdf☆14Apr 25, 2012Updated 14 years ago
- Cloud9 is a Hadoop toolkit for working with big data☆237Dec 15, 2015Updated 10 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆26Mar 19, 2012Updated 14 years ago
- A beanstalkd (distributed task queue) clone in clojure☆20Dec 11, 2011Updated 14 years ago
- Semantic Web database☆19Sep 1, 2022Updated 3 years ago
- C-style enums for ruby☆14Jun 12, 2011Updated 15 years ago
- GoldenOrb is an open-source implementation of Pregel, Google's graph processing framework☆293Jun 29, 2022Updated 3 years ago
- distributed latent dirichlet allocation☆29Dec 15, 2011Updated 14 years ago
- Go toolchain written in Rust☆10Updated this week
- vCat Java code☆11Jun 7, 2026Updated last week
- Example code for "Web-Scale Computer Vision using MapReduce for Multimedia Data Mining"☆48Aug 2, 2010Updated 15 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A Perl Semantic Web Framework☆19May 20, 2026Updated last month
- ☆12Oct 25, 2015Updated 10 years ago
- Yahoo!'s topic modelling framework using Latent Dirichlet Allocation☆337Sep 21, 2011Updated 14 years ago
- Crime Doesn't Climb in San Francisco☆100Jan 29, 2014Updated 12 years ago
- Docker image builder for eXist-db☆13Mar 16, 2021Updated 5 years ago
- State-of-The-Art Unsupervised Part-Of-Speech Type-Level Tagger in 300 Lines of Clojure☆41Sep 15, 2010Updated 15 years ago
- Greek Syntax - Query the Greek New Testament with XQuery, XPath, and Python in Jupyter Notebooks☆11Aug 11, 2021Updated 4 years ago
- An extension for eXist-db that allows the reading and writing of MARC into and out from the database☆11Mar 6, 2016Updated 10 years ago
- simple HTTP proxy based on tproxy☆27May 5, 2011Updated 15 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Ruby Linear Algebra Library☆108Jun 7, 2009Updated 17 years ago
- An implementation of Protocol Buffers for Ruby.☆58Feb 20, 2013Updated 13 years ago
- ☆10Apr 20, 2016Updated 10 years ago
- CoffeePot releases and website pages, see nineml/nineml☆13Dec 26, 2025Updated 5 months ago
- Mahout vector encoding for pig☆53Nov 20, 2022Updated 3 years ago
- Example of Rust API for Machine Learning☆19Sep 11, 2021Updated 4 years ago
- SynopsX is a lightweight XML publishing framework☆13Jun 5, 2026Updated 2 weeks ago