rflynn / regroupLinks
Generate a regular expression that describes a set of strings.
☆31Updated 3 years ago
Alternatives and similar repositories for regroup
Users that are interested in regroup are comparing it to the libraries listed below
Sorting:
- Common Crawl Index Server☆71Updated 10 months ago
- Adaptive crawler which uses Reinforcement Learning methods☆168Updated this week
- Python extension module for accelerating regular expressions using libesm☆132Updated 2 years ago
- An efficient simhash implementation for python☆127Updated 6 years ago
- Python code and data for the post "Word Segmentation, or Makingsenseofthis"☆17Updated 3 years ago
- A classifier for detecting soft 404 pages☆58Updated this week
- A classifier for detecting soft 404 pages☆16Updated 3 years ago
- Show summary of a large number of URLs in a Jupyter Notebook☆17Updated this week
- Automatically extracts and normalizes an online article or blog post publication date☆118Updated 2 years ago
- Find strings/words in text; convenience and C speed☆126Updated 3 years ago
- English word segmentation, written in pure-Python, and based on a trillion-word corpus.☆378Updated 3 years ago
- Lightning fast spell correction / fuzzy search library based on SymSpell by Commerce-Experts☆81Updated 7 years ago
- Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3☆51Updated 10 years ago
- Formasaurus tells you the type of an HTML form and its fields using machine learning☆119Updated this week
- Spam filtering made easy for you☆144Updated 6 years ago
- extract difference between two html pages☆32Updated this week
- Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…☆55Updated last year
- A tool to segment text based on frequencies and the Viterbi algorithm "#TheBoyWhoLived" => ['#', 'The', 'Boy', 'Who', 'Lived']☆81Updated 9 years ago
- A simple proof of concept levenshtein automaton in Python☆108Updated 10 years ago
- Framework for evaluating text extraction algorithms implemented as web services☆42Updated 13 years ago
- Automatically exported from code.google.com/p/chromium-compact-language-detector☆161Updated 5 years ago
- Web Content Extraction Through Machine Learning☆185Updated 11 years ago
- A generic crawler☆78Updated this week
- Locality-sensitive hashing algorithm for text similarity comparisons☆59Updated 9 months ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- A component that tries to avoid downloading duplicate content☆27Updated this week
- Fast approximate strings search & spelling correction☆60Updated 4 years ago
- Parse natural language time expressions in python☆131Updated 3 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆55Updated 4 years ago
- Algorithms for "schema matching"☆26Updated 9 years ago