mingruimingrui / ICU-tokenizerLinks
ICU based universal language tokenizer
☆32Updated 3 years ago
Alternatives and similar repositories for ICU-tokenizer
Users that are interested in ICU-tokenizer are comparing it to the libraries listed below
Sorting:
- Language-agnostic BERT Sentence Embedding (LaBSE)☆152Updated 4 years ago
- ☆87Updated 3 years ago
- ☆36Updated 2 years ago
- CharBERT: Character-aware Pre-trained Language Model (COLING2020)☆120Updated 4 years ago
- [NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining☆117Updated last year
- BERTserini☆26Updated 2 years ago
- AAAI-20 paper: Cross-Lingual Natural Language Generation via Pre-Training☆129Updated 3 years ago
- ☆92Updated 3 years ago
- SpanNER: Named EntityRe-/Recognition as Span Prediction☆131Updated 3 years ago
- ☆30Updated 4 years ago
- EMNLP 2021 Tutorial: Multi-Domain Multilingual Question Answering☆38Updated 3 years ago
- SUPERT: Unsupervised multi-document summarization evaluation & generation☆94Updated 2 years ago
- Kex is a python library for unsupervised keyword extraction from a document, providing an easy interface and benchmarks on 15 public data…☆54Updated 3 years ago
- [EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction☆119Updated 3 years ago
- [ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling☆71Updated 2 years ago
- code and data to faciliate BERT/ELECTRA for document ranking. Details refer to the paper - PARADE: Passage Representation Aggregation for…☆97Updated 2 years ago
- ☆42Updated 4 years ago
- source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.☆55Updated 4 years ago
- This repo contains the code for ACL2020 paper "Coreference Resolution as Query-based Span Prediction"☆139Updated 4 years ago
- The official code of the "Frustratingly Easy System Combination for Grammatical Error Correction" paper☆56Updated last year
- ☆59Updated last year
- NoiseMix - data generation for natural language☆40Updated 7 years ago
- SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples☆76Updated 2 years ago
- Source code for EMNLP 2020 paper "Coreferential Reasoning Learning for Language Representation"☆68Updated 2 years ago
- codes and pre-trained models of paper "Segatron: Segment-aware Transformer for Language Modeling and Understanding"☆18Updated 2 years ago
- Coreference resolution with different higher-order inference methods; implemented in PyTorch.☆36Updated 2 years ago
- Scripts to preprocess training and test data and to run fast_align and giza☆108Updated 3 years ago
- A library to conduct ranking experiments with transformers.☆160Updated 2 years ago
- Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.☆72Updated 3 years ago
- Code for cross-sentence grammatical error correction using multilayer convolutional seq2seq models (ACL 2019)☆50Updated 5 years ago