currentslab / fastlangid
fastlangid, the only language identification package that support cantonese (zh-yue), simplified (zh-hans) and traditional chinese (zh-hant)
☆39Updated 2 years ago
Alternatives and similar repositories for fastlangid:
Users that are interested in fastlangid are comparing it to the libraries listed below
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 3 years ago
- A simple neural truecaser written in pytorch and allennlp.☆33Updated 8 months ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆14Updated 6 months ago
- GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning☆27Updated 3 years ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆44Updated 9 months ago
- Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"☆40Updated 6 years ago
- Rust-based Python wrapper for duckling library in Haskell☆25Updated 4 years ago
- A flexible sentence segmentation library using CRF model and regex rules☆29Updated last year
- Tool to fix bitexts and tag near-duplicates for removal☆30Updated last month
- BERT models for many languages created from Wikipedia texts☆33Updated 4 years ago
- Learning BPE embeddings by first learning a segmentation model and then training word2vec☆19Updated 2 years ago
- A python true casing utility that restores case information for texts☆88Updated 2 years ago
- ☆168Updated 8 months ago
- Source code for the Apple reproduction☆32Updated 3 years ago
- A tiny BERT for low-resource monolingual models☆31Updated 5 months ago
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆10Updated last year
- zero-vocab or low-vocab embeddings☆18Updated 2 years ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆65Updated 2 years ago
- ☆33Updated 3 years ago
- ☆25Updated last year
- ☆30Updated 2 years ago
- Automatic extraction of edited sentences from text edition histories.☆82Updated 3 years ago
- SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP☆14Updated 3 years ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- Tool for parsing and converting various span encoding schemes.☆22Updated last year
- A simple client for doccano API.☆84Updated 9 months ago
- Pre-trained models and code and data to train and use models from "Pushing the Limits of Paraphrastic Sentence Embeddings with Millions o…☆101Updated last year
- Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection☆15Updated 3 years ago
- Tools for extracting parallel corpora from article titles across languages in Wikipedia☆72Updated 10 years ago
- This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.☆31Updated 4 years ago