currentslab / fastlangidLinks
fastlangid, the only language identification package that support cantonese (zh-yue), simplified (zh-hans) and traditional chinese (zh-hant)
☆39Updated 2 years ago
Alternatives and similar repositories for fastlangid
Users that are interested in fastlangid are comparing it to the libraries listed below
Sorting:
- Tool to fix bitexts and tag near-duplicates for removal☆30Updated 4 months ago
- A simple neural truecaser written in pytorch and allennlp.☆33Updated 11 months ago
- ☆30Updated 2 years ago
- A web application tagging and retrieval of arguments in text☆29Updated 2 years ago
- BERT models for many languages created from Wikipedia texts☆33Updated 5 years ago
- A python true casing utility that restores case information for texts☆88Updated 2 years ago
- c++ mosestokenizer☆18Updated last year
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆14Updated 9 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆153Updated last year
- python package for unsupervised text segmentation.☆14Updated 8 years ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆44Updated last year
- A flexible sentence segmentation library using CRF model and regex rules☆29Updated last year
- Source code for the Apple reproduction☆32Updated 4 years ago
- Load embeddings and featurize your sentences.☆30Updated 7 months ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆65Updated 2 years ago
- Learning BPE embeddings by first learning a segmentation model and then training word2vec☆19Updated 2 years ago
- Sentence transformers models for SpaCy☆107Updated 2 years ago
- GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning☆27Updated 4 years ago
- Automatic extraction of edited sentences from text edition histories.☆83Updated 3 years ago
- ☆171Updated 2 months ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- A tiny BERT for low-resource monolingual models☆31Updated 8 months ago
- SegEval Segmentation Evaluation Package☆56Updated last year
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆11Updated last year
- A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and …☆51Updated 6 months ago
- ☆33Updated 3 years ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 4 years ago
- ☆70Updated 2 years ago
- Language detection using Spacy and Fasttext☆55Updated last year