seomoz / simhash-cluster
A cluster implementation of simhash near-duplicate detection
☆32Updated 9 years ago
Related projects ⓘ
Alternatives and complementary repositories for simhash-cluster
- Distributed text analysis suite based on Celery☆94Updated last year
- Python API for Various DB-Backed Simhash Clusters☆64Updated 7 years ago
- An easy-install script for LibShortText☆27Updated 9 years ago
- Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3☆50Updated 9 years ago
- Pure python NLP toolkit☆55Updated 8 years ago
- An efficient simhash implementation for python☆124Updated 5 years ago
- the Chinese NLP full stack toolkit☆41Updated 9 years ago
- A python implementation of DEPTA☆83Updated 7 years ago
- A GBDT(MART) and LambdaMART training and predicting package☆15Updated 9 years ago
- tools for chinese word segmentation and pos tagging written in python☆38Updated 10 years ago
- Yet another Chinese word segmentation package based on character-based tagging heuristics and CRF algorithm☆243Updated 11 years ago
- The experiment software underlying two papers published at ECIR-2015 and SEMEVAL-2015.☆37Updated 9 years ago
- C++ Ternary Search Tree implementation with Python bindings☆43Updated 6 years ago
- Examples of Recommendations powered by MapReduce and mrjob☆56Updated 12 years ago
- LASER-A Scalable Response Prediction Platform For Online Advertising☆47Updated 10 years ago
- Chinese Tokenizer; New words Finder. 中文三段式机械分词算法; 未登录新词发现算法☆95Updated 8 years ago
- A readability parser which can extract title, content, images from html pages☆86Updated 4 years ago
- Output scrapy statistics to graphite/carbon☆54Updated 11 years ago
- 把之前 hanLP-python-flask 裡面的 hanLP 單獨分出來☆60Updated 6 years ago
- Replication software, data, and supplementary materials for the paper: O'Connor, Stewart and Smith, ACL-2013, "Learning to Extract Intern…☆26Updated 3 years ago
- Python bloom filter using redis as a shared backend.☆19Updated 7 years ago
- Notes on Logistic Regression and OWLQN☆26Updated 7 years ago
- Query-Document Relevance☆42Updated 9 years ago
- a bot for paperweekly☆30Updated 7 years ago
- Scrapy extension which writes crawled items to Kafka☆30Updated 6 years ago
- This open source project is a python wrapper for NLPIR.☆82Updated 9 years ago
- Paragraph Vector Implementation☆56Updated 7 years ago