mingruimingrui / ICU-tokenizer
ICU based universal language tokenizer
☆29Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for ICU-tokenizer
- SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples☆73Updated 2 years ago
- The dataset and PyTorch Implementation for ACL 2020 paper "MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Ans…☆44Updated 4 years ago
- ☆46Updated 3 years ago
- ☆66Updated 3 years ago
- Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.☆69Updated 3 years ago
- source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.☆56Updated 3 years ago
- SpanNER: Named EntityRe-/Recognition as Span Prediction☆124Updated 2 years ago
- ☆54Updated last year
- code and data to faciliate BERT/ELECTRA for document ranking. Details refer to the paper - PARADE: Passage Representation Aggregation for…☆97Updated last year
- [ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling☆71Updated last year
- Code and Data for SIGIR 2020 Paper "Few-Shot Generative Conversational Query Rewriting"☆65Updated last year
- [NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining☆118Updated last year
- ☆42Updated 4 years ago
- ☆92Updated 3 years ago
- CharBERT: Character-aware Pre-trained Language Model (COLING2020)☆117Updated 3 years ago
- Codes for the paper "Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding" (ACL-IJCNLP 2021)☆40Updated 3 years ago
- Code and data for the paper "Soft Gazetteers for Low-resource Named Entity Recognition"☆19Updated 4 years ago
- ☆36Updated 2 years ago
- This is the repository for SemEval 2021 Task 4: Reading Comprehension of Abstract Meaning. It includes code for baseline models and data.☆30Updated 3 years ago
- [NAACL'22] TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning☆92Updated 2 years ago
- EMNLP 2021 Tutorial: Multi-Domain Multilingual Question Answering☆38Updated 3 years ago
- Code for ACL2021 paper: "GLGE: A New General Language Generation Evaluation Benchmark"☆58Updated 2 years ago
- 🦮 Code and pretrained models for Findings of ACL 2022 paper "LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrie…☆49Updated 2 years ago
- ☆29Updated 4 years ago
- Language-agnostic BERT Sentence Embedding (LaBSE)☆140Updated 4 years ago
- ☆35Updated 3 years ago
- ☆63Updated last year
- ☆21Updated 2 years ago
- ☆116Updated 2 years ago
- Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations☆133Updated 5 months ago