Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop
☆85Jun 8, 2013Updated 12 years ago
Alternatives and similar repositories for wikihadoop
Users that are interested in wikihadoop are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A JRuby DSL for Cascading☆41Sep 23, 2015Updated 10 years ago
- playing around with the common crawl dataset☆70Aug 18, 2012Updated 13 years ago
- Wikipedia citation tool for Google Books, New York Times, ISBN, DOI and more☆22Oct 29, 2016Updated 9 years ago
- Orderable models for Django☆31Dec 30, 2014Updated 11 years ago
- ☆23Aug 2, 2021Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A simple column reader for ActiveRecord☆13Nov 1, 2011Updated 14 years ago
- Pikes is a Knowledge Extraction Suite☆23Nov 14, 2023Updated 2 years ago
- Implementation of Tyler Neylon's Locality-Specific Hash based on simplex tesselations☆28Oct 15, 2011Updated 14 years ago
- A Hadoop toolkit for web-scale information retrieval research☆86Dec 12, 2014Updated 11 years ago
- ☆25Feb 23, 2012Updated 14 years ago
- Fast and trainable tokenizer for natural languages relying on maximum entropy methods.☆23May 2, 2017Updated 9 years ago
- SQL Windowing Functions for Hadoop☆65Jun 20, 2022Updated 3 years ago
- useful JVM classes for the mrjob hadoop streaming framework☆31Jun 20, 2013Updated 12 years ago
- HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processi…☆610May 23, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36266.pdf☆14Apr 25, 2012Updated 14 years ago
- Cloud9 is a Hadoop toolkit for working with big data☆236Dec 15, 2015Updated 10 years ago
- ☆26Mar 19, 2012Updated 14 years ago
- JVMTI agent which calls mlockall and setuids down to a target user upon initialization☆21Sep 13, 2011Updated 14 years ago
- A beanstalkd (distributed task queue) clone in clojure☆20Dec 11, 2011Updated 14 years ago
- Semantic Web database☆19Sep 1, 2022Updated 3 years ago
- C-style enums for ruby☆14Jun 12, 2011Updated 14 years ago
- GoldenOrb is an open-source implementation of Pregel, Google's graph processing framework☆293Jun 29, 2022Updated 3 years ago
- distributed latent dirichlet allocation☆29Dec 15, 2011Updated 14 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Design of a specification for the automation of infrastructure deployments☆24Apr 6, 2022Updated 4 years ago
- Bulk loading for elastic search☆187Dec 16, 2023Updated 2 years ago
- 🐼 Easy to use and portable pronunciation data for Hanzi characters.☆15Feb 27, 2017Updated 9 years ago
- vCat Java code☆11Updated this week
- ⭐️⭐️⭐️⭐️⭐️ A 5-star rating widget implemented in JS and CSS☆23Oct 15, 2019Updated 6 years ago
- ☆11Feb 13, 2026Updated 3 months ago
- Example code for "Web-Scale Computer Vision using MapReduce for Multimedia Data Mining"☆48Aug 2, 2010Updated 15 years ago
- ☆12Oct 25, 2015Updated 10 years ago
- Yahoo!'s topic modelling framework using Latent Dirichlet Allocation☆337Sep 21, 2011Updated 14 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Docker image builder for eXist-db☆13Mar 16, 2021Updated 5 years ago
- Greek Syntax - Query the Greek New Testament with XQuery, XPath, and Python in Jupyter Notebooks☆11Aug 11, 2021Updated 4 years ago
- Break up your haproxy configs and join them together☆57Sep 7, 2012Updated 13 years ago
- An extension for eXist-db that allows the reading and writing of MARC into and out from the database☆11Mar 6, 2016Updated 10 years ago
- simple HTTP proxy based on tproxy☆27May 5, 2011Updated 15 years ago
- Sass plugin implementing TailwindCSS functions☆10Nov 28, 2023Updated 2 years ago
- Ruby Linear Algebra Library☆108Jun 7, 2009Updated 16 years ago