zaemyung / sentsplit
A flexible sentence segmentation library using CRF model and regex rules
☆29Updated last year
Alternatives and similar repositories for sentsplit:
Users that are interested in sentsplit are comparing it to the libraries listed below
- Megatron LM 11B on Huggingface Transformers☆27Updated 3 years ago
- KETOD Knowledge-Enriched Task-Oriented Dialogue☆32Updated 2 years ago
- The Shmoop Corpus☆16Updated 4 years ago
- ☆29Updated 2 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆21Updated last year
- ☆36Updated 2 years ago
- NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)☆36Updated 3 years ago
- ☆44Updated 4 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆30Updated last month
- A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering☆16Updated 2 years ago
- Pytorch Implementation of EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks☆63Updated 3 years ago
- ☆46Updated 2 years ago
- Pre-training BART in Flax on The Pile dataset☆20Updated 3 years ago
- ☆20Updated 2 years ago
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- ☆32Updated last week
- A library for data streaming and augmentation☆20Updated last year
- Library for experimenting with state-of-the-art evaluation metrics like UScore☆11Updated last year
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 3 years ago
- Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection☆15Updated 3 years ago
- Code for the paper "Modelling Latent Translations for Cross-Lingual Transfer"☆17Updated 3 years ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆16Updated last year
- PyTorch implementation of NAACL 2021 paper "Multi-view Subword Regularization"☆25Updated 3 years ago
- ☆25Updated last year
- Multilingual Open Text☆25Updated 5 months ago
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆16Updated last year
- Official code for LEWIS, from: "LEWIS: Levenshtein Editing for Unsupervised Text Style Transfer", ACL-IJCNLP 2021 Findings by Machel Rei…☆31Updated 2 years ago
- ☆12Updated last year
- Code for AAAI 2021 paper "Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance"☆25Updated 2 years ago
- NTREX -- News Test References for MT Evaluation☆81Updated 9 months ago