adbar / py3langid
Faster, modernized fork of the language identification tool langid.py
☆50Updated last month
Alternatives and similar repositories for py3langid:
Users that are interested in py3langid are comparing it to the libraries listed below
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 5 months ago
- A python module for word inflections designed for use with spaCy.☆92Updated 4 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆151Updated last month
- Targetted language identifier, based on FastText and Hunspell.☆33Updated 2 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆149Updated last year
- Transform TMX to text☆29Updated 2 years ago
- Rust-based Python wrapper for duckling library in Haskell☆25Updated 4 years ago
- Extracts plain text, language identification and more metadata from WARC records☆20Updated 5 months ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆111Updated this week
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆66Updated last year
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆69Updated 8 months ago
- A python true casing utility that restores case information for texts☆88Updated 2 years ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated 9 months ago
- Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.☆46Updated 3 weeks ago
- 🧪 Cutting-edge experimental spaCy components and features☆96Updated 8 months ago
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆12Updated 5 months ago
- Corpus preprocessing☆95Updated 10 months ago
- Source code for the Apple reproduction☆31Updated 3 years ago
- Searching in-memory corpus with Corpus Query Language (CQL)☆19Updated last month
- ☆22Updated 11 months ago
- Multilingual syllable annotation pipeline component for spacy☆39Updated last year
- Fast and accurate natural language detection. Detector written in Python. Nito-ELD, ELD.☆15Updated last year
- A Named-Entity Recogniser based on Grobid.☆49Updated 4 months ago
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.☆104Updated 2 months ago
- A sentence segmentation library with wide language support optimized for speed and utility.☆55Updated 4 months ago
- Featurize words into orthographic and phonological vectors.☆40Updated last year
- Text tokenization and sentence segmentation (segtok v2)☆203Updated 2 years ago
- Python Finite-State Toolkit☆47Updated last week
- Multi Tier Annotation Search☆26Updated 3 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated 10 months ago