rflynn / regroupLinks
Generate a regular expression that describes a set of strings.
☆31Updated 2 years ago
Alternatives and similar repositories for regroup
Users that are interested in regroup are comparing it to the libraries listed below
Sorting:
- Common Crawl Index Server☆70Updated 7 months ago
- Automatically extracts and normalizes an online article or blog post publication date☆117Updated 2 years ago
- Locality-sensitive hashing algorithm for text similarity comparisons☆58Updated 6 months ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆189Updated 3 years ago
- Python code and data for the post "Word Segmentation, or Makingsenseofthis"☆17Updated 2 years ago
- Fast multi-keyword search engine for text strings☆257Updated last year
- Knowledge extraction from web data☆92Updated 7 years ago
- Webrecorders DevTools Protocol Automation Library☆17Updated 2 years ago
- A classifier for detecting soft 404 pages☆56Updated this week
- Framework for evaluating text extraction algorithms implemented as web services☆42Updated 13 years ago
- A simple proof of concept levenshtein automaton in Python☆108Updated 10 years ago
- Compare html similarity using structural and style metrics☆214Updated 2 years ago
- A natural language semantic parser☆112Updated 7 years ago
- A generic crawler☆78Updated 7 years ago
- Formasaurus tells you the type of an HTML form and its fields using machine learning☆119Updated last year
- Adaptive crawler which uses Reinforcement Learning methods☆168Updated 7 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Python module for creating n-grams from a chunk of text☆31Updated last year
- An index data structure for approximate string search.☆23Updated 6 years ago
- Creates github index for similar repositories discovery☆192Updated 9 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 4 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 7 years ago
- Get user ids from social network handlers☆12Updated 8 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- Find strings/words in text; convenience and C speed☆127Updated 3 years ago
- extract difference between two html pages☆32Updated 7 years ago
- Implementation of perceptual image hash calculation in Python☆132Updated last year
- Python search module for fast approximate string matching☆54Updated 2 years ago
- Show summary of a large number of URLs in a Jupyter Notebook☆17Updated 4 years ago
- Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3☆51Updated 10 years ago