currentslab / fastlangid
fastlangid, the only language identification package that support cantonese (zh-yue), simplified (zh-hans) and traditional chinese (zh-hant)
☆39Updated last year
Related projects ⓘ
Alternatives and complementary repositories for fastlangid
- A simple neural truecaser written in pytorch and allennlp.☆32Updated 5 months ago
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 3 months ago
- BERT models for many languages created from Wikipedia texts☆34Updated 4 years ago
- A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT☆25Updated 3 years ago
- A python true casing utility that restores case information for texts☆87Updated 2 years ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆12Updated 2 months ago
- ☆165Updated 5 months ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆65Updated last year
- SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP☆14Updated 3 years ago
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆11Updated 9 months ago
- Alternative implementation of the coreference scorer for the CoNLL-2011/2012 shared tasks on coreference resolution☆11Updated 3 years ago
- A tiny BERT for low-resource monolingual models☆29Updated last month
- A flexible sentence segmentation library using CRF model and regex rules☆24Updated 8 months ago
- ☆29Updated 2 years ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆43Updated 6 months ago
- OpusFilter - Parallel corpus processing toolkit☆102Updated 3 months ago
- Tools for extracting parallel corpora from article titles across languages in Wikipedia☆72Updated 9 years ago
- python package for unsupervised text segmentation.☆14Updated 8 years ago
- Tool for parsing and converting various span encoding schemes.☆22Updated 10 months ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 3 years ago
- ☆34Updated 3 years ago
- List of corpora annotated for coreference for different languages☆17Updated 3 months ago
- Pre-trained models and code and data to train and use models from "Pushing the Limits of Paraphrastic Sentence Embeddings with Millions o…☆102Updated 11 months ago
- MAGPIE: A sense-annotated corpus of potentially idiomatic expressions☆25Updated 4 years ago
- ☆21Updated 4 years ago
- Language detection using Spacy and Fasttext☆54Updated 11 months ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆67Updated 7 months ago
- Source code for the Apple reproduction☆31Updated 3 years ago
- Data collection, alignment and TAUS repository☆20Updated 6 years ago
- GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning☆27Updated 3 years ago