wikimedia / sentencexLinks
A sentence segmentation library with wide language support optimized for speed and utility.
☆65Updated 3 weeks ago
Alternatives and similar repositories for sentencex
Users that are interested in sentencex are comparing it to the libraries listed below
Sorting:
- Faster, modernized fork of the language identification tool langid.py☆56Updated 7 months ago
- Seed Machine Translation Data☆32Updated 8 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆51Updated last week
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆167Updated last month
- The Open Parallel Corpus☆74Updated 3 months ago
- Multilingual sentence alignment using sentence embeddings☆120Updated 8 months ago
- Aksharamukha Python Library☆50Updated 5 months ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆52Updated 4 years ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆249Updated 2 years ago
- ☆74Updated 3 months ago
- 80x faster and 95% accurate language identification with Fasttext☆158Updated last year
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinki☆23Updated 2 weeks ago
- A modern, interlingual wordnet interface for Python☆254Updated last week
- Sentence aligner☆115Updated 4 years ago
- Efficient Low-Memory Aligner☆146Updated 6 months ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆68Updated 2 years ago
- Logical structure analysis for visually structured documents☆91Updated 2 years ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆153Updated 2 years ago
- Efficient teacher-student models and scripts to make them☆51Updated last year
- Transform TMX to text☆27Updated 2 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆30Updated 5 months ago
- OpusFilter - Parallel corpus processing toolkit☆106Updated 2 weeks ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated last year
- Bilingual term extractor☆54Updated last year
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆144Updated last month
- Cython wrapper on Hunspell Dictionary☆66Updated last year
- Bitextor generates translation memories from multilingual websites☆294Updated 8 months ago
- Open information and community for machine translation☆79Updated 2 weeks ago