An efficient simhash implementation for python
☆128Oct 25, 2019Updated 6 years ago
Alternatives and similar repositories for python-simhash
Users that are interested in python-simhash are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Simhash and near-duplicate detection☆424May 15, 2023Updated 2 years ago
- Scrapy spider middleware to clean up query parameters in request URLs☆24Jun 30, 2016Updated 9 years ago
- Simhashing in C++☆136Feb 14, 2023Updated 3 years ago
- Tool to flatten stream of JSON-like objects, configured via schema☆33Oct 19, 2019Updated 6 years ago
- A python implementation of DEPTA☆83Jan 14, 2017Updated 9 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Find which links on a web page are pagination links☆29Jan 12, 2017Updated 9 years ago
- Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations☆39May 21, 2024Updated last year
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- Automatic Item List Extraction☆86Jun 15, 2016Updated 9 years ago
- Similarity hashing☆49Jul 19, 2011Updated 14 years ago
- A project to attempt to automatically login to a website given a single seed☆11Jun 17, 2024Updated last year
- SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation, Simhash and SimhashIndex☆19Nov 18, 2022Updated 3 years ago
- A fast python implementation of the SimHash algorithm.☆27Oct 27, 2021Updated 4 years ago
- Paginating the web☆37Feb 11, 2014Updated 12 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Find community/segment in an attributed graph of Facebook data.☆18Apr 20, 2017Updated 8 years ago
- NER toolkit for HTML data☆259May 3, 2024Updated last year
- Django feeds provides an extensive database model for RSS feeds and a fault tolerant parser.☆30Jun 14, 2012Updated 13 years ago
- A pure-python HTML screen-scraping library☆1,887Apr 4, 2022Updated 4 years ago
- vertical search crawler☆38Jan 9, 2012Updated 14 years ago
- An implementation of the MinHash algorithm in ruby using Murmur Hash☆26May 8, 2009Updated 16 years ago
- Distributed Web Crawler, Parser and Search Engine.☆10Jun 16, 2016Updated 9 years ago
- Graphical techniques for text mining.☆19Jun 12, 2015Updated 10 years ago
- MAGPIE: A sense-annotated corpus of potentially idiomatic expressions☆31Jun 7, 2020Updated 5 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A platform for collecting, analyzing, and visualizing social media data.☆13Dec 27, 2020Updated 5 years ago
- Graph tool is a very powerful tool for working with Graphs in C++ or Python. In this repo I exported the Quick start tutorial in their do…☆24May 28, 2020Updated 5 years ago
- Code for paper Document-Level Paraphrase Generation with Sentence Rewriting and Reordering by Zhe Lin, Yitao Cai and Xiaojun Wan. This pa…☆26Nov 10, 2021Updated 4 years ago
- Relatively simple text classification powered by spaCy☆41Oct 20, 2015Updated 10 years ago
- spaCy-to-naf converter☆21Jun 10, 2025Updated 10 months ago
- SpamBayes spam classifier written in Python☆19Jun 12, 2023Updated 2 years ago
- This repo contains the code and results for reproducing the results in the paper: A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDD…☆12Jul 13, 2018Updated 7 years ago
- Pikes is a Knowledge Extraction Suite☆23Nov 14, 2023Updated 2 years ago
- Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…☆55May 21, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A lightweight python actor framework☆19Jan 29, 2016Updated 10 years ago
- a testimonials app for Django☆27Jun 19, 2021Updated 4 years ago
- A Text Comprehension Engine in Python☆15Aug 23, 2015Updated 10 years ago
- Example Python code for comparing documents using MinHash☆251Feb 11, 2019Updated 7 years ago
- A framework for PSL inference.☆21Nov 9, 2015Updated 10 years ago
- Sign Nodemailer mail using S/MIME☆13Oct 15, 2020Updated 5 years ago
- Restrict crawl and scraping scope using matchers.☆26Jun 8, 2016Updated 9 years ago