ogrisel/pignlproc

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ogrisel/pignlproc)

ogrisel / pignlproc

Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.

☆163

Alternatives and similar repositories for pignlproc

Users that are interested in pignlproc are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

insideout10 / stanbol-freeling
View on GitHub
Stanbol Freeling is an external HTTP based providing API to Stanbol NLP Engines.
☆22Jul 16, 2016Updated 10 years ago
jpatanooga / Caduceus
View on GitHub
Set of example algorithm implementations focused on statistics and machine learning
☆31Apr 11, 2011Updated 15 years ago
alienrobotwizard / sounder
View on GitHub
A grouping of Apache Pig examples.
☆65Oct 13, 2020Updated 5 years ago
iconara / piglet
View on GitHub
Piglet is a DSL for writing Pig scripts in Ruby
☆83Jul 21, 2010Updated 16 years ago
YahooArchive / howl
View on GitHub
Common metadata layer for Hadoop's Map Reduce, Pig, and Hive
☆77Feb 17, 2011Updated 15 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
tdunning / pig-vector
View on GitHub
Mahout vector encoding for pig
☆53Nov 20, 2022Updated 3 years ago
dbpedia-spotlight / pignlproc
View on GitHub
Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
☆17May 15, 2015Updated 11 years ago
datawrangling / spatialanalytics
View on GitHub
Where 2.0 Workshop Code: Spatial Analysis of Tweets using Hadoop, Pig, Python & Mechanical Turk. Slides here: http://www.slideshare.net/…
☆134Mar 31, 2010Updated 16 years ago
tdunning / Plume
View on GitHub
Explorations relative to cloning FlumeJava
☆94Oct 13, 2020Updated 5 years ago
dbpedia-spotlight / dbpedia-spotlight
View on GitHub
DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text.
☆759Mar 8, 2018Updated 8 years ago
alienrobotwizard / varaha
View on GitHub
Machine learning and natural language processing with Apache Pig
☆53Dec 17, 2013Updated 12 years ago
apache / stanbol
View on GitHub
Mirror of Apache Stanbol (incubating)
☆118Feb 29, 2024Updated 2 years ago
rjurney / Cloud-Stenography
View on GitHub
Main Repo
☆15Jun 24, 2010Updated 16 years ago
julienledem / Pig-scripting-examples
View on GitHub
Examples of use of pig scripting languages capabilities
☆39Aug 1, 2016Updated 9 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
LinkedInAttic / datafu
View on GitHub
Hadoop library for large-scale data processing, now an Apache Incubator project
☆581Jul 8, 2014Updated 12 years ago
DigitalPebble / behemoth
View on GitHub
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
☆283Apr 25, 2018Updated 8 years ago
hmason / tc
View on GitHub
A command-line twitter client with smart filtering and statistical classification
☆166Oct 18, 2010Updated 15 years ago
algoriffic / lsa4solr
View on GitHub
Document clustering based on Latent Semantic Analysis
☆96Apr 29, 2010Updated 16 years ago
ogrisel / dbpediakit
View on GitHub
Python utilities to do work with the DBpedia dumps for analytics.
☆39May 11, 2012Updated 14 years ago
spullara / havrobase
View on GitHub
Use Avro to store all your values in HBase instead of regular columns
☆76Dec 1, 2017Updated 8 years ago
romainr / PigEditor
View on GitHub
Eclipse plugin for Apache Pig
☆33Jul 22, 2013Updated 13 years ago
infochimps-labs / wonderdog
View on GitHub
Bulk loading for elastic search
☆186Dec 16, 2023Updated 2 years ago
jatrost / hadoop-binary-analysis
View on GitHub
Framework that makes processing arbitrary binary data in Hadoop easier
☆22Apr 8, 2013Updated 13 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
brendano / ark-tweet-nlp
View on GitHub
CMU ARK Twitter Part-of-Speech Tagger
☆575Dec 17, 2023Updated 2 years ago
shilad / PyVowpal
View on GitHub
Python wrapper for the Vowpal Wabbit machine learning library.
☆52Jul 19, 2013Updated 13 years ago
klbostee / dumbo
View on GitHub
Python module that allows one to easily write and run Hadoop programs.
☆1,030Jan 9, 2018Updated 8 years ago
andrewclegg / sketchy
View on GitHub
Simple approximate-nearest-neighbours in Python using locality sensitive hashing.
☆141Jun 21, 2012Updated 14 years ago
pprett / bolt
View on GitHub
Bolt Online Learning Toolbox
☆87Oct 5, 2011Updated 14 years ago
toddstavish / Cassandra-Graph-Extract
View on GitHub
Extracts A Social Network From Cassandra NoSQL Data-store To The InfiniteGraph Graph Database For Analysis
☆16Aug 26, 2010Updated 15 years ago
TAwarehouse / backup-hadoop-and-hive
View on GitHub
☆21May 9, 2012Updated 14 years ago
lintool / Cloud9
View on GitHub
Cloud9 is a Hadoop toolkit for working with big data
☆237Dec 15, 2015Updated 10 years ago
alad / Mekano
View on GitHub
Building blocks for Information Retrieval & Machine Learning
☆16Oct 12, 2010Updated 15 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
LanceNorskog / LSH-Hadoop
View on GitHub
Implementation of Tyler Neylon's Locality-Specific Hash based on simplex tesselations
☆28Oct 15, 2011Updated 14 years ago
matpalm / common-crawl-quick-hacks
View on GitHub
common crawl quick hack examples
☆19Feb 11, 2015Updated 11 years ago
cloudera / emailarchive
View on GitHub
Hadoop for archiving email
☆23Sep 29, 2011Updated 14 years ago
matpalm / common-crawl
View on GitHub
playing around with the common crawl dataset
☆70Aug 18, 2012Updated 13 years ago
gparker / vowpal_wabbit
View on GitHub
John Langford's original release of Vowpal Wabbit -- a fast online learning algorithm
☆57Aug 1, 2024Updated last year
japerk / nltk-trainer
View on GitHub
Train NLTK objects with zero code
☆743Apr 13, 2020Updated 6 years ago
sudar / Yahoo_LDA
View on GitHub
Yahoo!'s topic modelling framework using Latent Dirichlet Allocation
☆337Sep 21, 2011Updated 14 years ago