jdf / cue.language
A small Java library for simple text analysis - counting strings, identifying languages, and removing stop words.
☆156Updated 5 years ago
Alternatives and similar repositories for cue.language:
Users that are interested in cue.language are comparing it to the libraries listed below
- A small Java library for simple text analysis - counting strings, identifying languages, and removing stop words.☆59Updated 7 years ago
- [not maintained] Custom Twitter Search via ElasticSearch&Wicket☆61Updated 4 years ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆281Updated 6 years ago
- SIREn - Semi-Structured Information Retrieval Engine☆107Updated 3 years ago
- Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.☆158Updated 2 years ago
- Java implementation of a probabilistic set data structure☆143Updated 7 years ago
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆71Updated 10 months ago
- Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading p…☆142Updated 2 years ago
- A Java implementation of Twitter's text processing library☆364Updated 10 years ago
- ElasticSearch OSEM☆22Updated last year
- WARC (Web Archive) Input and Output Formats for Hadoop☆35Updated 10 years ago
- Provides support to increase developer productivity in Java when using a graph database like Neo4j. Uses familiar Spring concepts such a…☆64Updated 3 years ago
- Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JS…☆155Updated 7 years ago
- Find the Git commits you're looking for☆118Updated 2 years ago
- Various utilities regarding Levenshtein transducers. (Java)☆57Updated 3 years ago
- Common Crawl support library to access 2008-2012 crawl archives (ARC files)☆495Updated 7 years ago
- Machine learning and natural language processing with Apache Pig☆53Updated 11 years ago
- Java/JNI bindings to libpostal for for fast international street address parsing/normalization☆112Updated 10 months ago
- Bulk loading for elastic search☆185Updated last year
- Code examples for my book "Practical Semantic Web Programming (Java, Scala, Clojure, and JRuby Edition)☆77Updated last year
- Java text categorization system☆55Updated 7 years ago
- A port of the arclabs 'readability' package to Java☆72Updated 12 years ago
- A library that adds some NLP capabilities to the Lucene search engine☆50Updated 11 years ago
- Java framework for Google App Engine☆80Updated 5 years ago
- A fast and easy to use decision tree learner in java☆232Updated 2 years ago
- Pure Java implementation of the liblzo2 LZO compression algorithm☆47Updated 13 years ago
- Eclipse plugin for Apache Pig☆33Updated 11 years ago
- Gradle plugin for automated release management.☆54Updated 11 years ago
- Siena is a persitence API for Java inspired on the Google App Engine Python Datastore API☆81Updated 2 years ago
- do all first links on wikipedia _really_ lead to philosophy?☆22Updated 12 years ago