mingruimingrui / ICU-tokenizerLinks
ICU based universal language tokenizer
☆33Updated 3 years ago
Alternatives and similar repositories for ICU-tokenizer
Users that are interested in ICU-tokenizer are comparing it to the libraries listed below
Sorting:
- Language-agnostic BERT Sentence Embedding (LaBSE)☆153Updated 5 years ago
- SpanNER: Named EntityRe-/Recognition as Span Prediction☆131Updated 3 years ago
- CharBERT: Character-aware Pre-trained Language Model (COLING2020)☆121Updated 4 years ago
- Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.☆72Updated 4 years ago
- [NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining☆117Updated 2 years ago
- ☆36Updated 3 years ago
- [EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction☆120Updated 3 years ago
- X-BERT: eXtreme Multi-label Text Classification with BERT☆52Updated 6 years ago
- This repo contains the code for ACL2020 paper "Coreference Resolution as Query-based Span Prediction"☆139Updated 5 years ago
- ☆92Updated 3 years ago
- ☆42Updated 5 years ago
- BERTserini☆26Updated 2 years ago
- Multilingual abstractive summarization dataset extracted from WikiHow.☆95Updated 6 months ago
- State of the art Semantic Sentence Embeddings☆99Updated 3 years ago
- ☆46Updated 3 years ago
- This repo supports various cross-lingual transfer learning & multilingual NLP models.☆92Updated 2 years ago
- code and data to faciliate BERT/ELECTRA for document ranking. Details refer to the paper - PARADE: Passage Representation Aggregation for…☆97Updated 2 years ago
- ☆18Updated 4 years ago
- SUPERT: Unsupervised multi-document summarization evaluation & generation☆95Updated 2 years ago
- IEEE/ACM TASLP 2020: SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models☆180Updated 4 years ago
- ☆57Updated 2 years ago
- We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…☆81Updated 4 years ago
- [EMNLP 2021] Improving and Simplifying Pattern Exploiting Training☆153Updated 3 years ago
- Scripts to preprocess training and test data and to run fast_align and giza☆108Updated 3 years ago
- AAAI-20 paper: Cross-Lingual Natural Language Generation via Pre-Training☆129Updated 4 years ago
- Code and data for the paper "Soft Gazetteers for Low-resource Named Entity Recognition"☆19Updated 4 years ago
- EMNLP 2021 Tutorial: Multi-Domain Multilingual Question Answering☆38Updated 3 years ago
- Self-supervised NER prototype - updated version (69 entity types - 17 broad entity groups). Uses pretrained BERT models with no fine tuni…☆78Updated 3 years ago
- ☆68Updated 4 months ago
- Dataset for NAACL 2021 paper: "DART: Open-Domain Structured Data Record to Text Generation"☆155Updated 2 years ago