javasoze / meaningfulwebLinks
Web page content extractor
☆31Updated 12 years ago
Alternatives and similar repositories for meaningfulweb
Users that are interested in meaningfulweb are comparing it to the libraries listed below
Sorting:
- faceted search engine☆55Updated 11 years ago
- Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading p…☆142Updated 3 years ago
- A Lazy Data Flow Framework (no longer active - see Apache TinkerPop)☆277Updated 3 years ago
- A fast and easy to use decision tree learner in java☆233Updated 3 years ago
- Strata is the new open source analytics and market risk library from OpenGamma☆238Updated 7 years ago
- Human-Powered Data Analysis with Mechanical Turk☆300Updated 12 years ago
- Zohmg is a data store for aggregation of multi-dimensional time series data, built on top of Hadoop, Dumbo and HBase.☆174Updated 12 years ago
- A Java library that manages component action/event bindings for MVC patterns☆115Updated last week
- Tool to help users migrate large relational databases into Hadoop clusters.☆67Updated 13 years ago
- Java implementation of a probabilistic set data structure☆144Updated 8 years ago
- Use Solr clients/tools with ElasticSearch☆77Updated 12 years ago
- Common Crawl support library to access 2008-2012 crawl archives (ARC files)☆501Updated 7 years ago
- Bulk loading for elastic search☆185Updated last year
- Document clustering based on Latent Semantic Analysis☆96Updated 15 years ago
- Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.☆158Updated 2 years ago
- Html Content / Article Extractor in Scala - open sourced from Gravity Labs - http://gravity.com☆343Updated 5 years ago
- A Property Graph Algorithms Package (no longer active - see Apache TinkerPop)☆96Updated 4 years ago
- A semantic web crawler☆20Updated 14 years ago
- This is a prototype app that store items into a Hazelcast map and queue based on the description in https://wiki.mozilla.org/Socorro:Clie…☆17Updated 14 years ago
- SIREn - Semi-Structured Information Retrieval Engine☆107Updated 4 years ago
- Some utilities for Lucene☆110Updated 12 years ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆282Updated 7 years ago
- A distributed task queue worker designed for throughput, parallelism, and clustering.☆238Updated 2 years ago
- ☆309Updated 4 years ago
- Library to use Kestrel as a spout within Storm☆134Updated 8 years ago
- Jline 1.x; The Jline readline simulator for Java☆48Updated 8 years ago
- A command-line twitter client with smart filtering and statistical classification☆165Updated 14 years ago
- Where 2.0 Workshop Code: Spatial Analysis of Tweets using Hadoop, Pig, Python & Mechanical Turk. Slides here: http://www.slideshare.net/…☆134Updated 15 years ago
- A new communications experience for the enterprise☆170Updated 11 years ago
- Demo visualization of Neo4j data☆162Updated 8 years ago