rflynn / regroupLinks
Generate a regular expression that describes a set of strings.
☆31Updated 3 years ago
Alternatives and similar repositories for regroup
Users that are interested in regroup are comparing it to the libraries listed below
Sorting:
- Common Crawl Index Server☆71Updated 11 months ago
- Python code and data for the post "Word Segmentation, or Makingsenseofthis"☆17Updated 3 years ago
- Find strings/words in text; convenience and C speed☆126Updated 3 years ago
- Fast multi-keyword search engine for text strings☆258Updated last year
- English word segmentation, written in pure-Python, and based on a trillion-word corpus.☆378Updated 3 years ago
- Spam filtering made easy for you☆144Updated 6 years ago
- Python extension module for accelerating regular expressions using libesm☆132Updated 2 years ago
- Creates github index for similar repositories discovery☆192Updated 9 years ago
- A distributed system for mining common crawl using SQS, AWS-EC2 and S3☆21Updated 11 years ago
- Adaptive crawler which uses Reinforcement Learning methods☆168Updated this week
- Python module to generate regular all expression matches☆187Updated last year
- Preparing DMOZ dataset for my n-Gram LM-based URL classification research☆31Updated 11 years ago
- Formasaurus tells you the type of an HTML form and its fields using machine learning☆119Updated this week
- Parse natural language time expressions in python☆131Updated 3 years ago
- Polyglot skipgram embeddings, and their many health benefits☆12Updated 6 years ago
- Automatically extracts and normalizes an online article or blog post publication date☆118Updated 2 years ago
- Lightning Fast Language Prediction 🚀☆167Updated 5 months ago
- Knowledge extraction from web data☆92Updated 7 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Python BK-tree data structure to allow fast querying of "close" matches☆187Updated 4 years ago
- Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…☆55Updated last year
- An efficient simhash implementation for python☆127Updated 6 years ago
- A classifier for detecting soft 404 pages☆16Updated 3 years ago
- Language Detection with Infinity-gram☆230Updated 10 years ago
- Lightning fast spell correction / fuzzy search library based on SymSpell by Commerce-Experts☆81Updated 7 years ago
- Nostril: Nonsense String Evaluator☆199Updated 3 years ago
- Extracts the top level domain (TLD) from the URL given.☆185Updated 8 months ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆193Updated 3 years ago
- A pure python implementation of locality sensitive hashing for text documents☆87Updated 10 years ago
- Algorithms for URL Classification☆19Updated 10 years ago