currentslab / fastlangid
fastlangid, the only language identification package that support cantonese (zh-yue), simplified (zh-hans) and traditional chinese (zh-hant)
☆39Updated 2 years ago
Alternatives and similar repositories for fastlangid:
Users that are interested in fastlangid are comparing it to the libraries listed below
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 5 months ago
- BERT models for many languages created from Wikipedia texts☆34Updated 4 years ago
- Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection☆15Updated 3 years ago
- ☆21Updated 5 years ago
- ☆33Updated 3 years ago
- A simple neural truecaser written in pytorch and allennlp.☆32Updated 7 months ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 3 years ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆14Updated 5 months ago
- A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT☆26Updated 4 years ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆65Updated 2 years ago
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆10Updated 11 months ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆44Updated 8 months ago
- c++ mosestokenizer☆17Updated 10 months ago
- SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP☆13Updated 3 years ago
- Pre-trained models and code and data to train and use models from "Pushing the Limits of Paraphrastic Sentence Embeddings with Millions o…☆101Updated last year
- A python true casing utility that restores case information for texts☆88Updated 2 years ago
- Tools for extracting parallel corpora from article titles across languages in Wikipedia☆72Updated 9 years ago
- Source code for the Apple reproduction☆31Updated 3 years ago
- Open source library for few shot NLP☆77Updated last year
- Automatic extraction of edited sentences from text edition histories.☆82Updated 2 years ago
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations☆54Updated 2 years ago
- XL-AMR is a sequence-to-graph cross-lingual AMR parser that exploits transfer learning (EMNLP2020).☆16Updated 6 months ago
- A flexible sentence segmentation library using CRF model and regex rules☆28Updated 11 months ago
- Bicleaner fork that uses neural networks☆39Updated 6 months ago
- MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale☆14Updated 3 years ago
- GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning☆27Updated 3 years ago
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆15Updated 11 months ago
- A tiny BERT for low-resource monolingual models☆31Updated 4 months ago
- CoNLL 2005 SRL (Semantic Role Labeling) evaluation script, implemented in Python☆8Updated 6 years ago
- ☆36Updated 2 years ago