mikemccand / chromium-compact-language-detector
Automatically exported from code.google.com/p/chromium-compact-language-detector
☆160Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for chromium-compact-language-detector
- Compact Language Detector 2☆843Updated 3 years ago
- Language Detection with Infinity-gram☆231Updated 9 years ago
- This is a mirror of the script by Giuseppe Attardi, and contains history before the official repo started: https://github.com/attardi/wik…☆258Updated 8 years ago
- Carrot2 plugin for ElasticSearch☆292Updated last year
- Score documents with pure dot product / cosine similarity with ES☆250Updated 3 years ago
- Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages☆539Updated 3 years ago
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 8 years ago
- This tool extracts word vectors from Lucene index.☆134Updated 6 years ago
- An efficient simhash implementation for python☆124Updated 5 years ago
- Text classification using Naive Bayes and Elasticsearch☆154Updated 8 years ago
- Html Content / Article Extractor in Scala - open sourced from Gravity Labs - http://gravity.com☆343Updated 5 years ago
- A plugin for language detection in Elasticsearch using Nakatani Shuyo's language detector☆251Updated 6 years ago
- Elasticsearch Index Termlist☆117Updated 5 years ago
- [NO LONGER MAINTAINED AS OPEN SOURCE - USE SCALETEXT.COM INSTEAD]☆109Updated 11 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Updated 7 years ago
- Heuristic based boilerplate removal tool☆727Updated 6 months ago
- CMU ARK Twitter Part-of-Speech Tagger☆574Updated 10 months ago
- Elasticsearch entity resolution plugin based on Duke☆210Updated 4 years ago
- "Stop worrying about Elasticsearch analyzers", my therapist says☆154Updated 3 years ago
- This is a fork of the Stanford Named Entity Recognizer with added support for deploying in Java servlet mode. See github.com/dat/pyner fo…☆90Updated 11 years ago
- ☆184Updated 5 years ago
- Simhash and near-duplicate detection☆409Updated last year
- A python library detect and extract listing data from HTML page.☆109Updated 7 years ago
- Python interface to the Stanford Named Entity Recognizer☆293Updated 3 years ago
- Fast multi-keyword search engine for text strings☆247Updated last month
- ☆97Updated 3 years ago
- NEWS: JATE2.0 Beta.11 Released, see details below.☆81Updated last year
- A text tagger based on Lucene / Solr, using FST technology☆174Updated 10 months ago
- Github mirror of "search/highlighter" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access…☆100Updated 5 months ago