currentslab / fastlangid
fastlangid, the only language identification package that support cantonese (zh-yue), simplified (zh-hans) and traditional chinese (zh-hant)
☆38Updated last year
Related projects ⓘ
Alternatives and complementary repositories for fastlangid
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 2 months ago
- BERT models for many languages created from Wikipedia texts☆34Updated 4 years ago
- Source code for the Apple reproduction☆31Updated 3 years ago
- List of corpora annotated for coreference for different languages☆17Updated 3 months ago
- ☆34Updated 3 years ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆40Updated 2 years ago
- ☆21Updated 4 years ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆65Updated last year
- SeqScore: Scoring for named entity recognition and other sequence labeling tasks☆20Updated 3 weeks ago
- GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning☆27Updated 3 years ago
- MAGPIE: A sense-annotated corpus of potentially idiomatic expressions☆25Updated 4 years ago
- Bicleaner fork that uses neural networks☆38Updated 3 months ago
- SegEval Segmentation Evaluation Package☆55Updated last year
- A python true casing utility that restores case information for texts☆87Updated last year
- A simple neural truecaser written in pytorch and allennlp.☆31Updated 4 months ago
- Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"☆39Updated 5 years ago
- Deep Dependency Representation☆16Updated 6 years ago
- Tool for parsing and converting various span encoding schemes.☆22Updated 9 months ago
- python package for unsupervised text segmentation.☆14Updated 8 years ago
- c++ mosestokenizer☆16Updated 7 months ago
- A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT☆25Updated 3 years ago
- Automatic extraction of edited sentences from text edition histories.☆81Updated 2 years ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 3 years ago
- Tools for extracting parallel corpora from article titles across languages in Wikipedia☆72Updated 9 years ago
- A tiny BERT for low-resource monolingual models☆29Updated last month
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆12Updated 2 months ago
- PyTorch-IE: State-of-the-art Information Extraction in PyTorch☆75Updated this week
- Bilingual sentence similarity classifier using Tensorflow☆19Updated 5 years ago
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆72Updated 2 years ago
- This repository contains the code for "BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Representations".☆63Updated 4 years ago