A cluster implementation of simhash near-duplicate detection
☆32Mar 11, 2015Updated 11 years ago
Alternatives and similar repositories for simhash-cluster
Users that are interested in simhash-cluster are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Gevent Crawling in Python, with Utilities☆22Mar 12, 2015Updated 11 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- simple log library for Golang☆14Jun 25, 2015Updated 10 years ago
- This is a repository in which we take part in the big data competition, focusing on recommendation system.☆17May 24, 2016Updated 9 years ago
- ☆14Aug 24, 2021Updated 4 years ago
- Parser for KAF NAF files written in Python☆16Jul 1, 2021Updated 4 years ago
- 常用配置和工具☆29Sep 11, 2024Updated last year
- 对电影进行多标签标注☆18Jan 30, 2015Updated 11 years ago
- A script that simplifies working with archetypes in Hugo! (@gohugoio) Also supports bulk file creation/editing via a single .csv! 🐍☆17Nov 15, 2021Updated 4 years ago
- DIY shooting training system☆16Aug 18, 2024Updated last year
- A C implementation of a Boldi-Vigna graph decompressor☆17Jul 5, 2016Updated 9 years ago
- Vocabulary Tree Code☆71Aug 22, 2016Updated 9 years ago
- ☆17Jun 7, 2019Updated 6 years ago
- Website for standardized execution and evaluation of algorithms on datasets.☆36Nov 14, 2019Updated 6 years ago
- python3 package supporting efficient storage and querying of sets of sets using the trie data structure. Supports finding all the superse…☆23Sep 15, 2023Updated 2 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Jun 12, 2020Updated 5 years ago
- TalkingData AdTracking Fraud Detection Challenge on Kaggle Competition☆13Sep 24, 2018Updated 7 years ago
- A collection of documents and materials for the EMNLP-2015 Semantic Similarity tutorial☆30Sep 30, 2015Updated 10 years ago
- COMCMS is a simple CMS for go.☆18Feb 28, 2018Updated 8 years ago
- A one stop solution to navigate the endless sea of online courses.☆10Oct 17, 2021Updated 4 years ago
- A python implementation of DEPTA☆83Jan 14, 2017Updated 9 years ago
- Compute association strength over semantic networks in a dimensionality-reduced form.☆32Aug 14, 2015Updated 10 years ago
- A project providing a Lambda Layer that provides SQLite support in Python3.6 Lambdas☆25Nov 30, 2018Updated 7 years ago
- Time series foreasting using Facebook's Prophet and Apache Spark☆14Dec 9, 2019Updated 6 years ago
- This a module to extract RDF from an HTML5 page annotated with microdata. The module implements the algorithm defined and published by th…☆45Jun 21, 2022Updated 3 years ago
- A distributed system for mining common crawl using SQS, AWS-EC2 and S3☆22Jun 24, 2014Updated 11 years ago
- ☆12May 14, 2025Updated 10 months ago
- QUAC ("quantitative analysis of chatter" or any related acronym you like) is a package for acquiring and analyzing social Internet conten…☆68Jun 5, 2020Updated 5 years ago
- My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR…☆36Sep 22, 2015Updated 10 years ago
- Shared rbtree of ngx-lua☆21Jun 7, 2016Updated 9 years ago
- A demo of Bayesian neural networks, using SVI and HMC.☆13Sep 12, 2019Updated 6 years ago
- Simple supervisor to run daemons☆17Oct 30, 2024Updated last year
- Cloud Mining automatically builds exploratory faceted search systems.☆52Oct 15, 2013Updated 12 years ago
- Proxy the sd_notify messages between systemd and process in a different cgroup☆17Jul 31, 2018Updated 7 years ago
- ☆14Nov 25, 2023Updated 2 years ago
- AST normalization experiment☆45Jan 31, 2019Updated 7 years ago
- Scraper for real estate listings on Trulia.com implemented in Python with Scrapy☆20Nov 1, 2019Updated 6 years ago
- Tornado Oauth 2 client☆17Dec 20, 2022Updated 3 years ago
- 龙之谷☆14Apr 25, 2017Updated 8 years ago