metzlerd/mavuno

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/metzlerd/mavuno)

metzlerd / mavuno

Mavuno: A Hadoop-Based Text Mining Toolkit

☆48

Alternatives and similar repositories for mavuno

Users that are interested in mavuno are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kevinweil / FileSetInputFormat
View on GitHub
A Hadoop input format for sending lists of files as keys to a mapper. Set the list of files, and an input split will be created per file…
☆16Apr 7, 2010Updated 16 years ago
DigitalPebble / behemoth
View on GitHub
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
☆282Apr 25, 2018Updated 8 years ago
lintool / Ivory
View on GitHub
A Hadoop toolkit for web-scale information retrieval research
☆87Dec 12, 2014Updated 11 years ago
tdunning / pig-vector
View on GitHub
Mahout vector encoding for pig
☆53Nov 20, 2022Updated 3 years ago
elazarl / hadoop_rpc_walktrhough
View on GitHub
What happens on the wire when Hadoop RPC call is issued?
☆13Jul 1, 2022Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
lintool / Cloud9
View on GitHub
Cloud9 is a Hadoop toolkit for working with big data
☆237Dec 15, 2015Updated 10 years ago
pranab / visitante
View on GitHub
Set of Hadoop, Spark and Storm based tools for web and customer analytic
☆34Jun 7, 2021Updated 5 years ago
lintool / twitter-tools
View on GitHub
Twitter Tools
☆222Feb 18, 2018Updated 8 years ago
cloudera / emailarchive
View on GitHub
Hadoop for archiving email
☆22Sep 29, 2011Updated 14 years ago
tomslabs / avro-utils
View on GitHub
Utilities to use Avro files from Hadoop Map/Reduce jobs and Streaming
☆26Sep 10, 2013Updated 12 years ago
datasalt / pangool
View on GitHub
Tuple MapReduce for Hadoop: Hadoop API made easy
☆57Jun 27, 2022Updated 4 years ago
alienrobotwizard / varaha
View on GitHub
Machine learning and natural language processing with Apache Pig
☆53Dec 17, 2013Updated 12 years ago
heathermiller / menthor
View on GitHub
Parallelizing Machine Learning-- Functionally.
☆56Jun 14, 2012Updated 14 years ago
LinkedInAttic / datafu
View on GitHub
Hadoop library for large-scale data processing, now an Apache Incubator project
☆581Jul 8, 2014Updated 12 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
tadglines / Socket.IO-Java
View on GitHub
☆38Sep 3, 2012Updated 13 years ago
aparo / elasticsearch
View on GitHub
Open Source, Distributed, RESTful Search Engine - Paro Edition - I add opinionated stuff that will never be in ElasticSearch. Tracking st…
☆31Oct 30, 2016Updated 9 years ago
wpm / Hadoop-GATE
View on GitHub
A Hadoop job that runs GATE applications
☆15Oct 16, 2013Updated 12 years ago
pierre / sweeper
View on GitHub
Hadoop utility to quickly find large directories to clean up or small files to combine.
☆15Jan 12, 2012Updated 14 years ago
vmware-archive / training
View on GitHub
☆19Mar 24, 2022Updated 4 years ago
zinniasystems / Nectar
View on GitHub
Open source framework for predictive modeling on Apache Hadoop
☆34Aug 23, 2014Updated 11 years ago
signal-ai / Signal-1M-Tools
View on GitHub
☆50Sep 3, 2019Updated 6 years ago
pranab / fluxua
View on GitHub
A simple easy to use Hadoop map reduce workflow engine
☆18Mar 30, 2012Updated 14 years ago
felipemoraes / se-recsys-paperswithcode
View on GitHub
Search and Recommender Systems papers with Code
☆21Nov 17, 2018Updated 7 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ogrisel / pignlproc
View on GitHub
Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
☆163Nov 8, 2022Updated 3 years ago
tdunning / knn
View on GitHub
Large scale k-nn experiments
☆69Jul 31, 2024Updated last year
vietansegan / sits
View on GitHub
Speaker Identity for Topic Segmentation (SITS)
☆13Dec 14, 2014Updated 11 years ago
jpatanooga / Metronome
View on GitHub
Suite of parallel iterative algorithms built on top of Iterative Reduce
☆111Jun 24, 2014Updated 12 years ago
YahooArchive / xpath_proto_builder
View on GitHub
xpath_proto_builder is a library to convert objects (JSON, XML, POJO) into protobuf using xpath notation.
☆19Nov 15, 2022Updated 3 years ago
Cascading / fluid
View on GitHub
A Fluent Java API for Cascading
☆22Jun 14, 2017Updated 9 years ago
mewo2 / musichackathon
View on GitHub
EMI Music Hackathon entry
☆23Jul 30, 2012Updated 13 years ago
jpatanooga / Lumberyard
View on GitHub
iSAX Indexing persisted in HBase
☆39Jul 26, 2011Updated 15 years ago
lodqa / lodqa
View on GitHub
A system to generate SPARQL queries from natural language queries.
☆30Feb 15, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
PRImA-Research-Lab / prima-core-libs
View on GitHub
Core libraries by the PRImA Research Lab
☆16Jul 30, 2024Updated last year
atbrox / Snabler
View on GitHub
Parallel Algorithms in Python for Hadoop/Mapreduce
☆55Aug 10, 2012Updated 13 years ago
cgivre / drillworkshop
View on GitHub
Repository for the Apache Drill Workshop
☆18Oct 31, 2016Updated 9 years ago
joestein / amaunet
View on GitHub
Python Streaming Example
☆17Dec 29, 2014Updated 11 years ago
asafamr / SymPatternWSI
View on GitHub
Word Sense Induction with neural Bi-language Models and symmetric patterns
☆12Aug 31, 2018Updated 7 years ago
kasnerz / d2t_iterative_editing
View on GitHub
Code for the paper Data-to-Text Generation with Iterative Text Editing
☆14Mar 23, 2021Updated 5 years ago
jongwook / spark-ranking-metrics
View on GitHub
Offline Recommender System Evaluation for Spark
☆29Jul 2, 2017Updated 9 years ago