kerighan / eldar
Boolean text search in Python
☆45Updated 2 years ago
Alternatives and similar repositories for eldar:
Users that are interested in eldar are comparing it to the libraries listed below
- ☆54Updated last year
- Fast and robust date extraction from web pages, with Python or on the command-line☆126Updated 4 months ago
- ☆30Updated 2 years ago
- A Streamlit component for annotating text by text selecting.☆40Updated 10 months ago
- Multilingual syllable annotation pipeline component for spacy☆39Updated 2 years ago
- Faster, modernized fork of the language identification tool langid.py☆55Updated 5 months ago
- 🔤 Measure edit distance based on keyboard layout☆60Updated last year
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆17Updated 8 months ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated last year
- A python package to simulate typographical errors.☆34Updated last year
- A helper library full of URL-related heuristics.☆69Updated last month
- Small python package to measure OCR quality and other related metrics.☆21Updated last year
- Blazing fast topic modelling for short texts.☆31Updated last month
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆62Updated this week
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- ☆69Updated 3 years ago
- Quote identification, attribution and resolution.☆12Updated last year
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated last month
- Robust and fast topic models with sentence-transformers.☆48Updated this week
- An open-source package for python to clean raw text data☆69Updated last year
- Tools for interactive visual exploration of semantic embeddings.☆32Updated 8 months ago
- RaKUn 2.0 - A fast keyword detection algorithm☆67Updated 3 weeks ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated last year
- Finds linguistic patterns effortlessly☆36Updated last year
- Python package for deduplication/entity resolution using active learning☆79Updated 8 months ago
- Library for fast text representation and classification.☆28Updated last year
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆72Updated last week
- 🔢 Work with static vector models☆28Updated 2 weeks ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆137Updated 4 months ago