Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop
☆85Jun 8, 2013Updated 12 years ago
Alternatives and similar repositories for wikihadoop
Users that are interested in wikihadoop are comparing it to the libraries listed below
Sorting:
- Code for aggregating wikipedia traffic statistics☆36May 25, 2013Updated 12 years ago
- A JRuby DSL for Cascading☆41Sep 23, 2015Updated 10 years ago
- playing around with the common crawl dataset☆70Aug 18, 2012Updated 13 years ago
- For ruby; a simple way of doing heavy work in a background thread in and when you really need the object it will block until it is done☆23Apr 8, 2011Updated 14 years ago
- ☆25Feb 23, 2012Updated 14 years ago
- A beanstalkd (distributed task queue) clone in clojure☆20Dec 11, 2011Updated 14 years ago
- distributed latent dirichlet allocation☆29Dec 15, 2011Updated 14 years ago
- A set of examples and utilities for using Pig with Cassandra. For the latest jar release, check the Downloads link.☆84Aug 21, 2014Updated 11 years ago
- Pikes is a Knowledge Extraction Suite☆23Nov 14, 2023Updated 2 years ago
- Examples of use of pig scripting languages capabilities☆39Aug 1, 2016Updated 9 years ago
- An implementation of Protocol Buffers for Ruby.☆58Feb 20, 2013Updated 13 years ago
- zeromq input and output modules for rsyslog☆98Feb 24, 2012Updated 14 years ago
- GoldenOrb is an open-source implementation of Pregel, Google's graph processing framework☆294Jun 29, 2022Updated 3 years ago
- mruby-r: Use (m)Ruby for returning data to R☆26Aug 8, 2015Updated 10 years ago
- A set of convenience functions in R for exploring iPhone and iPad location data☆37Apr 25, 2011Updated 14 years ago
- A back end as a service based on MongoDB☆57Aug 18, 2015Updated 10 years ago
- bugspots for subversion☆15Jan 17, 2012Updated 14 years ago
- Introduction to Singularity containers and the Slurm job scheduler.☆14May 30, 2024Updated last year
- A KEDA external scaler for the Durable Task Azure Storage backend.☆10Updated this week
- A Hadoop toolkit for web-scale information retrieval research☆85Dec 12, 2014Updated 11 years ago
- This repo contains my coursework, assignments for IBM RAG and Agentic AI Professional Certificate on Coursera☆24Jul 9, 2025Updated 8 months ago
- Yahoo!'s topic modelling framework using Latent Dirichlet Allocation☆337Sep 21, 2011Updated 14 years ago
- Port of Android Notepad tutorial to Mirah☆14Nov 27, 2025Updated 3 months ago
- A nodejs binding for crfsuite☆14Feb 26, 2024Updated 2 years ago
- home of the open source Redis Watch - a newsletter about Everything and Anything Redis☆11Oct 24, 2019Updated 6 years ago
- Example how to append data to a Haskell executable using sqlite☆10Mar 16, 2020Updated 5 years ago
- BPR recommender system☆10Apr 14, 2018Updated 7 years ago
- A bookmarket that directs you back to HackerNews' comments page☆19Jun 2, 2011Updated 14 years ago
- Contacts API on steroids☆17Dec 21, 2011Updated 14 years ago
- ☆12Mar 29, 2011Updated 14 years ago
- Docker image builder for eXist-db☆13Mar 16, 2021Updated 4 years ago
- Utility classes for dense and sparse matrices in JCuda☆11Mar 8, 2019Updated 7 years ago
- CRUD Generator for PlayFramework 2 (2.5.x)/Slick(3.1.x)/Scala☆25Apr 3, 2016Updated 9 years ago
- Simple HTTP redirector for tmpnb nodes☆12Sep 20, 2017Updated 8 years ago
- Wrapper for generating PROV provenance information for commands and python scripts☆15Oct 14, 2014Updated 11 years ago
- One Ruby gem to rule the zanox API☆15Apr 19, 2011Updated 14 years ago
- An EasyMock inspired mocking library for erlang.☆23Mar 28, 2023Updated 2 years ago
- Deep Learning Toolbox in Matlab☆13Oct 10, 2017Updated 8 years ago
- An API for launching/configuring/and maintaining services in the clouds☆17Apr 15, 2015Updated 10 years ago