A cluster implementation of simhash near-duplicate detection
☆32Mar 11, 2015Updated 11 years ago
Alternatives and similar repositories for simhash-cluster
Users that are interested in simhash-cluster are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Gevent Crawling in Python, with Utilities☆22Mar 12, 2015Updated 11 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- A Pinterest likely application which also involves LBS and SNS.☆92Dec 6, 2013Updated 12 years ago
- This is a repository in which we take part in the big data competition, focusing on recommendation system.☆17May 24, 2016Updated 10 years ago
- ☆14Aug 24, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- TreeDict is a fast, flexible and full-featured hierarchical python container that makes simple and sophisticated bookkeeping easy.☆33Apr 14, 2016Updated 10 years ago
- a minimum demo web framework based on servlet☆10Sep 3, 2015Updated 10 years ago
- Parser for KAF NAF files written in Python☆16Jul 1, 2021Updated 5 years ago
- A C implementation of a Boldi-Vigna graph decompressor☆17Jul 5, 2016Updated 9 years ago
- Topic Detection from English text using BERT + Bi-GRU + CRF☆14Feb 11, 2020Updated 6 years ago
- Website for standardized execution and evaluation of algorithms on datasets.☆36Nov 14, 2019Updated 6 years ago
- python3 package supporting efficient storage and querying of sets of sets using the trie data structure. Supports finding all the superse…☆23Sep 15, 2023Updated 2 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆53Jun 12, 2020Updated 6 years ago
- A collection of documents and materials for the EMNLP-2015 Semantic Similarity tutorial☆30Sep 30, 2015Updated 10 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A python implementation of DEPTA☆83Jan 14, 2017Updated 9 years ago
- classify a job description (or noisy job title) into a ONET job title☆19Oct 14, 2016Updated 9 years ago
- Compute association strength over semantic networks in a dimensionality-reduced form.☆32Aug 14, 2015Updated 10 years ago
- This a module to extract RDF from an HTML5 page annotated with microdata. The module implements the algorithm defined and published by th…☆45Jun 21, 2022Updated 4 years ago
- A distributed system for mining common crawl using SQS, AWS-EC2 and S3☆22Jun 24, 2014Updated 12 years ago
- QUAC ("quantitative analysis of chatter" or any related acronym you like) is a package for acquiring and analyzing social Internet conten…☆68Jun 5, 2020Updated 6 years ago
- My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR…☆38Sep 22, 2015Updated 10 years ago
- Cloud Mining automatically builds exploratory faceted search systems.☆52Oct 15, 2013Updated 12 years ago
- Re-usable wrapper scripts for text document extractors.☆37Jun 18, 2016Updated 10 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot☆13Dec 18, 2020Updated 5 years ago
- Open Source Implementation of Simhash in Python☆24Sep 14, 2017Updated 8 years ago
- Source real estate prices from the Common Crawl.☆27Oct 22, 2018Updated 7 years ago
- Kafka Gateway (gRPC/protobuf + http/json)☆40Oct 19, 2018Updated 7 years ago
- Ubiflux Vigor ventilation system RS485 Modbus communications with Python☆12Feb 20, 2026Updated 4 months ago
- agent has moved to https://lab.allmende.io/valueflows/agent☆10Jun 23, 2020Updated 6 years ago
- Example of how to learn vector presentation of words in python using Gensim on english wikipedia articles.☆25Jul 3, 2016Updated 10 years ago
- Visual SPARQL query tool☆10Feb 26, 2016Updated 10 years ago
- The toolkit called magyarlanc aims at the basic linguistic processing of Hungarian texts. The toolkit consists of only JAVA modules (the…☆13Jun 21, 2016Updated 10 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- 一个超简单的基于leveldb开发的http api接口☆10Dec 23, 2015Updated 10 years ago
- This project deals with hierarchical classification of web pages based on dmoz dataset.☆14Apr 10, 2014Updated 12 years ago
- Combining encoder-based language models☆11Nov 11, 2021Updated 4 years ago
- Scraper built with Scrapy.☆18Jun 25, 2026Updated last week
- CROMER (CROss-document Main Events and entities Recognition), is a tool for cross-document coreference☆12Jan 14, 2015Updated 11 years ago
- Chambua is an open-source semantic tagging application that analyses text and extracts names of people, places (& geocodes them), organis…☆33Nov 12, 2021Updated 4 years ago
- Stream Processing ToolKit☆17Aug 14, 2015Updated 10 years ago