mikemccand / chromium-compact-language-detector
Automatically exported from code.google.com/p/chromium-compact-language-detector
☆160Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for chromium-compact-language-detector
- Compact Language Detector 2☆844Updated 3 years ago
- Language Detection with Infinity-gram☆231Updated 9 years ago
- Heuristic based boilerplate removal tool☆729Updated 6 months ago
- Simhash and near-duplicate detection☆410Updated last year
- Python API for Various DB-Backed Simhash Clusters☆64Updated 7 years ago
- Carrot2 plugin for ElasticSearch☆292Updated last year
- [NO LONGER MAINTAINED AS OPEN SOURCE - USE SCALETEXT.COM INSTEAD]☆109Updated 11 years ago
- Fast multi-keyword search engine for text strings☆247Updated 2 months ago
- Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages☆539Updated 3 years ago
- An efficient simhash implementation for python☆125Updated 5 years ago
- Updates to Zope's keyphrase extractor (forked from 1.1.0)☆67Updated 7 years ago
- This is a fork of the Stanford Named Entity Recognizer with added support for deploying in Java servlet mode. See github.com/dat/pyner fo…☆90Updated 11 years ago
- A simple proof of concept levenshtein automaton in Python☆107Updated 9 years ago
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆269Updated 2 years ago
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 8 years ago
- Quickly extract multi-word phrases from a corpus☆191Updated 4 years ago
- Github mirror of "search/highlighter" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access…☆100Updated this week
- Pysolr — Python Solr client☆667Updated 2 weeks ago
- A text tagger based on Lucene / Solr, using FST technology☆176Updated 11 months ago
- Quality information extraction at web scale. Edit☆327Updated 7 years ago
- Score documents with pure dot product / cosine similarity with ES☆250Updated 3 years ago
- Python extension module for accelerating regular expressions using libesm☆132Updated last year
- A large-scale statistical machine translation system written in Java.☆208Updated 2 years ago
- ☆214Updated 2 years ago
- ☆797Updated last year
- English word segmentation, written in pure-Python, and based on a trillion-word corpus.☆365Updated last year
- ☆339Updated last year
- ☆151Updated 4 years ago
- Text classification using Naive Bayes and Elasticsearch☆154Updated 8 years ago