mingruimingrui / ICU-tokenizer
ICU based universal language tokenizer
☆30Updated 2 years ago
Alternatives and similar repositories for ICU-tokenizer:
Users that are interested in ICU-tokenizer are comparing it to the libraries listed below
- ☆36Updated 2 years ago
- EMNLP 2021 Tutorial: Multi-Domain Multilingual Question Answering☆38Updated 3 years ago
- CharBERT: Character-aware Pre-trained Language Model (COLING2020)☆118Updated 3 years ago
- Code for ACL2021 paper: "GLGE: A New General Language Generation Evaluation Benchmark"☆58Updated 2 years ago
- ☆18Updated 3 years ago
- GMEG☆29Updated last month
- [NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining☆118Updated last year
- Coreference resolution with different higher-order inference methods; implemented in PyTorch.☆36Updated last year
- Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection☆15Updated 3 years ago
- ☆16Updated 4 years ago
- This is the code for the EMNLP2020 Finding paper "BERT for Monolingual and Cross-Lingual Reverse Dictionary"☆19Updated 4 years ago
- ☆36Updated 3 years ago
- 🦮 Code and pretrained models for Findings of ACL 2022 paper "LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrie…☆49Updated 2 years ago
- ☆46Updated 3 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 2 years ago
- We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…☆30Updated 4 years ago
- Source code for paper Grammatical Error Correction in Low-Resource Scenarios (W-NUT 2019)☆13Updated 2 years ago
- EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generation☆96Updated last year
- ☆66Updated 3 years ago
- NoiseMix - data generation for natural language☆40Updated 6 years ago
- REALSumm: Re-evaluating Evaluation in Text Summarization☆71Updated 2 years ago
- codes and pre-trained models of paper "Segatron: Segment-aware Transformer for Language Modeling and Understanding"☆18Updated 2 years ago
- Lexically Constrained Neural Machine Translation with Levenshtein Transformer☆39Updated 4 years ago
- ☆40Updated 3 years ago
- Progressively Pretrained Dense Corpus Index for Open-Domain QA and Information Retrieval☆43Updated last year
- CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP☆51Updated 2 years ago
- ☆55Updated last year
- Word sense disambiguation using contextualized word embedding☆16Updated 5 years ago
- [ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling☆71Updated 2 years ago
- Implementation of pQRNN in PyTorch☆46Updated 3 years ago