A cluster implementation of simhash near-duplicate detection
☆32Mar 11, 2015Updated 10 years ago
Alternatives and similar repositories for simhash-cluster
Users that are interested in simhash-cluster are comparing it to the libraries listed below
Sorting:
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 8 years ago
- Gevent Crawling in Python, with Utilities☆22Mar 12, 2015Updated 10 years ago
- ☆14Aug 24, 2021Updated 4 years ago
- Parser for KAF NAF files written in Python☆16Jul 1, 2021Updated 4 years ago
- simple log library for Golang☆14Jun 25, 2015Updated 10 years ago
- TreeDict is a fast, flexible and full-featured hierarchical python container that makes simple and sophisticated bookkeeping easy.☆33Apr 14, 2016Updated 9 years ago
- Website for standardized execution and evaluation of algorithms on datasets.☆36Nov 14, 2019Updated 6 years ago
- python3 package supporting efficient storage and querying of sets of sets using the trie data structure. Supports finding all the superse…☆23Sep 15, 2023Updated 2 years ago
- My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR…☆36Sep 22, 2015Updated 10 years ago
- Ubiflux Vigor ventilation system RS485 Modbus communications with Python☆11Feb 20, 2026Updated last week
- Compute association strength over semantic networks in a dimensionality-reduced form.☆32Aug 14, 2015Updated 10 years ago
- A collection of documents and materials for the EMNLP-2015 Semantic Similarity tutorial☆30Sep 30, 2015Updated 10 years ago
- Cloud Mining automatically builds exploratory faceted search systems.☆52Oct 15, 2013Updated 12 years ago
- a minimum demo web framework based on servlet☆10Sep 3, 2015Updated 10 years ago
- The goal of this experiment is to take articles and certain metadata and group them by topic.☆11Apr 14, 2016Updated 9 years ago
- QUAC ("quantitative analysis of chatter" or any related acronym you like) is a package for acquiring and analyzing social Internet conten…☆68Jun 5, 2020Updated 5 years ago
- Re-usable wrapper scripts for text document extractors.☆37Jun 18, 2016Updated 9 years ago
- Bicycle Incident reporting☆13Jul 22, 2022Updated 3 years ago
- ClassicUO - an open source implementation of the Ultima Online Classic Client.☆11Sep 22, 2025Updated 5 months ago
- An open-source news aggregator☆15Sep 9, 2016Updated 9 years ago
- ☆12Oct 25, 2015Updated 10 years ago
- Digitization information system build on top of Fedora repository☆16Jan 15, 2019Updated 7 years ago
- An automated testing framework for api services☆11Jan 16, 2026Updated last month
- Focused Crawler for VT's CTRNet☆10May 13, 2013Updated 12 years ago
- ☆11Dec 21, 2023Updated 2 years ago
- Green SqlAlchemy extensions for pulsar☆11Nov 24, 2017Updated 8 years ago
- Hungarian tokenizer.☆14Mar 15, 2022Updated 3 years ago
- ☆65Jan 30, 2020Updated 6 years ago
- ETL project to download and process both CME open interest data, COT data from the CFTC and NAV/shares-outstanding data from various ETF …☆12Jul 13, 2021Updated 4 years ago
- LODmilla - a graph-based Linked Open Data browser☆18Apr 5, 2017Updated 8 years ago
- ☆13Aug 6, 2019Updated 6 years ago
- BlogBridge, the cross platform, open source, blog and rss reader with super powers!☆29Nov 2, 2011Updated 14 years ago
- The Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web spiders and other web robots from access…☆16Mar 20, 2018Updated 7 years ago
- RDFSpace constructs a vector space from any RDF dataset which can be used for computing similarities between resources in that dataset.☆41Nov 8, 2013Updated 12 years ago
- PicoTTS wrapper for NodeJS. PicoTTS is being used by Android and it's extremely lightweight and fast yet produces very natural voices.☆16Apr 23, 2014Updated 11 years ago
- Spring integration with Stardog RDF database☆18Jan 27, 2025Updated last year
- ☆11Dec 26, 2022Updated 3 years ago
- Taws - A personal and private web search engine☆24Feb 20, 2015Updated 11 years ago
- A collection of various discourse segmenters☆10Jun 30, 2017Updated 8 years ago