coady / lupyne
Pythonic search engine based on PyLucene.
☆119Updated 2 months ago
Related projects: ⓘ
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆148Updated last year
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆66Updated 2 weeks ago
- ☆159Updated 3 months ago
- A python module for word inflections designed for use with spaCy.☆90Updated 4 years ago
- Use ML-Annotate to label data for machine learning purposes☆104Updated 4 years ago
- Super lightweight function registries for your library☆172Updated 3 months ago
- Python port of Boilerpipe library☆81Updated last month
- Text tokenization and sentence segmentation (segtok v2)☆200Updated 2 years ago
- Sentence transformers models for SpaCy☆104Updated last year
- python library to simplify working with jsonlines and ndjson data☆264Updated last month
- Confection: the sweetest config system for Python☆175Updated 3 months ago
- A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any othe…☆65Updated last year
- A simple client for doccano API.☆81Updated 3 months ago
- ☆70Updated last year
- 🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)☆425Updated 2 months ago
- Information extraction from English and German texts based on predicate logic☆133Updated last year
- A spaCy wrapper for DBpedia Spotlight☆103Updated last year
- A Python implementation of Lunr.js 🌖☆188Updated last week
- Fast and robust date extraction from web pages, with Python or on the command-line☆118Updated 2 weeks ago
- Parse natural language time expressions in python☆131Updated last year
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…☆191Updated last year
- Language detection using Spacy and Fasttext☆53Updated 9 months ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆170Updated 2 years ago
- Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.☆244Updated 7 months ago
- Parse numbers written in natural language☆104Updated this week
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆113Updated 2 weeks ago
- ☆65Updated 2 years ago
- Pure python Aho-Corasick library.☆209Updated last year
- ☆46Updated this week
- Accurately find/replace/remove emojis in text strings☆153Updated 9 months ago