rflynn / regroup
Generate a regular expression that describes a set of strings.
☆29Updated 2 years ago
Alternatives and similar repositories for regroup:
Users that are interested in regroup are comparing it to the libraries listed below
- Common Crawl Index Server☆66Updated 2 weeks ago
- Python search module for fast approximate string matching☆54Updated 2 years ago
- Locality-sensitive hashing algorithm for text similarity comparisons☆58Updated 3 years ago
- Knowledge extraction from web data☆92Updated 6 years ago
- extract difference between two html pages☆32Updated 6 years ago
- Natural Language Generator for Python☆27Updated 7 years ago
- Polyglot skipgram embeddings, and their many health benefits☆12Updated 5 years ago
- Crawler that retrieves commoncrawl's crawled hosts and their corresponding IPs☆17Updated 6 months ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated 5 months ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆189Updated 2 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Webrecorders DevTools Protocol Automation Library☆17Updated 2 years ago
- A pure Python implementation of Aho-Corasick algorithm.☆22Updated 6 years ago
- a pure python MurmurHash3 implementation.☆68Updated 5 years ago
- Show summary of a large number of URLs in a Jupyter Notebook☆17Updated 3 years ago
- Automatically exported from code.google.com/p/guess-language☆53Updated last year
- Simple heuristic for measuring web page similarity (& data set)☆90Updated 6 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3☆51Updated 9 years ago
- It finds best synonyms from Google Books when you press a hotkey☆30Updated 10 years ago
- Algorithms for "schema matching"☆26Updated 8 years ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- framework for making streamcorpus data☆11Updated 7 years ago
- Napkin is a simple tool to produce statistical analysis of a text☆12Updated 11 months ago
- Creates github index for similar repositories discovery☆192Updated 8 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- Streaming web crawler with WebSocket API☆44Updated last year
- Scrapy middleware for the autologin☆37Updated 6 years ago
- Deep Semantic Code Search aims to explore a joint embedding space for code and description vectors and then use it for a code search appl…☆65Updated 6 months ago