rflynn / regroup
Generate a regular expression that describes a set of strings.
☆30Updated 2 years ago
Alternatives and similar repositories for regroup
Users that are interested in regroup are comparing it to the libraries listed below
Sorting:
- Common Crawl Index Server☆68Updated 2 months ago
- Python code and data for the post "Word Segmentation, or Makingsenseofthis"☆17Updated 2 years ago
- Python search module for fast approximate string matching☆54Updated 2 years ago
- A pure Python implementation of Aho-Corasick algorithm.☆22Updated 6 years ago
- It finds best synonyms from Google Books when you press a hotkey☆30Updated 10 years ago
- Locality-sensitive hashing algorithm for text similarity comparisons☆58Updated last month
- "its like OAB in python because snake"☆14Updated 7 years ago
- Tripod is a tool/ML model for computing latent representations for large sequences☆16Updated last week
- Framework for evaluating text extraction algorithms implemented as web services☆42Updated 12 years ago
- CHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages☆20Updated 7 years ago
- Neural Elastic Inference and Search☆19Updated 5 years ago
- Python tool for normilizing text and text canonicalization (DISCONTINUED)☆41Updated 11 years ago
- Napkin is a simple tool to produce statistical analysis of a text☆12Updated last year
- Scrapy extension which writes crawled items to Kafka☆30Updated 6 years ago
- GHRecommender - personalized recommendations for GitHub projects based on information about repositories starred by the user☆26Updated 2 years ago
- Fast approximate strings search & spelling correction☆58Updated 3 years ago
- Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group.☆81Updated 4 years ago
- Algorithms for "schema matching"☆26Updated 8 years ago
- Polyglot skipgram embeddings, and their many health benefits☆12Updated 5 years ago
- A classifier for detecting soft 404 pages☆17Updated 2 years ago
- Fast Word Segmentation with Triangular Matrix☆81Updated 3 years ago
- High-coverage and high-precision lexica of terms annotated with emotion scores for English and Italian.☆153Updated 6 months ago
- a Deep Learning based Speller☆27Updated 6 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Python library for image hashing and deduplication☆11Updated 9 years ago
- Python library to share machine learning models easily and reliably.☆18Updated 5 years ago
- Extract Unique Word Lists From Wikipedia Database☆12Updated 4 years ago
- An Exploration into Graph Databases☆28Updated 9 years ago
- 📄Source code variable naming using a seq2seq architecture☆10Updated 5 years ago
- An efficient simhash implementation for python☆124Updated 5 years ago