A cluster implementation of simhash near-duplicate detection
☆32Mar 11, 2015Updated 11 years ago
Alternatives and similar repositories for simhash-cluster
Users that are interested in simhash-cluster are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Gevent Crawling in Python, with Utilities☆22Mar 12, 2015Updated 11 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- ☆14Aug 24, 2021Updated 4 years ago
- TreeDict is a fast, flexible and full-featured hierarchical python container that makes simple and sophisticated bookkeeping easy.☆33Apr 14, 2016Updated 10 years ago
- Find duplicate files on your computer☆21Jun 6, 2020Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- a minimum demo web framework based on servlet☆10Sep 3, 2015Updated 10 years ago
- Parser for KAF NAF files written in Python☆16Jul 1, 2021Updated 4 years ago
- 常用配置和工具☆29Sep 11, 2024Updated last year
- Simhashing in C++☆136Feb 14, 2023Updated 3 years ago
- 对电影进行多标签标注☆18Jan 30, 2015Updated 11 years ago
- A script that simplifies working with archetypes in Hugo! (@gohugoio) Also supports bulk file creation/editing via a single .csv! 🐍☆17Nov 15, 2021Updated 4 years ago
- Simhash and near-duplicate detection☆423May 15, 2023Updated 2 years ago
- Module to create static html reports☆14Jul 2, 2024Updated last year
- For the filthiest web scrapers that have no time for rate-limits.☆19Oct 11, 2020Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A C implementation of a Boldi-Vigna graph decompressor☆17Jul 5, 2016Updated 9 years ago
- Vocabulary Tree Code☆71Aug 22, 2016Updated 9 years ago
- Shader笔记,参考书籍,文档☆11Jan 25, 2024Updated 2 years ago
- Website for standardized execution and evaluation of algorithms on datasets.☆36Nov 14, 2019Updated 6 years ago
- python3 package supporting efficient storage and querying of sets of sets using the trie data structure. Supports finding all the superse…☆23Sep 15, 2023Updated 2 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Jun 12, 2020Updated 5 years ago
- TalkingData AdTracking Fraud Detection Challenge on Kaggle Competition☆13Sep 24, 2018Updated 7 years ago
- Hands-On Quantum Information Processing with Python, published by Packt☆20Oct 31, 2022Updated 3 years ago
- ☆13Jul 21, 2016Updated 9 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A one stop solution to navigate the endless sea of online courses.☆10Oct 17, 2021Updated 4 years ago
- This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/googleapis-common-protos☆20Jan 10, 2025Updated last year
- A python implementation of DEPTA☆83Jan 14, 2017Updated 9 years ago
- Apache开源分布式资源管理框架mesos源码注释分析,新增mesos_framework_demo,带详细注释☆14Sep 21, 2017Updated 8 years ago
- A Python application which is used to pull data from the United States Federal Treasury API.☆16Jul 15, 2021Updated 4 years ago
- classify a job description (or noisy job title) into a ONET job title☆19Oct 14, 2016Updated 9 years ago
- react-ts-antd-template☆10Mar 27, 2020Updated 6 years ago
- The Redis protocol on top of LevelDB, written in Go (WIP)☆58Jan 4, 2014Updated 12 years ago
- Compute association strength over semantic networks in a dimensionality-reduced form.☆32Aug 14, 2015Updated 10 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This a module to extract RDF from an HTML5 page annotated with microdata. The module implements the algorithm defined and published by th…☆45Jun 21, 2022Updated 3 years ago
- ☆12May 14, 2025Updated 11 months ago
- QUAC ("quantitative analysis of chatter" or any related acronym you like) is a package for acquiring and analyzing social Internet conten…☆68Jun 5, 2020Updated 5 years ago
- ClassicUO - an open source implementation of the Ultima Online Classic Client.☆11Sep 22, 2025Updated 7 months ago
- wechat robot supported blockchain☆11Jan 3, 2023Updated 3 years ago
- ☆21Updated this week
- My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR…☆36Sep 22, 2015Updated 10 years ago