zaemyung / sentsplit
A flexible sentence segmentation library using CRF model and regex rules
☆29Updated last year
Alternatives and similar repositories for sentsplit:
Users that are interested in sentsplit are comparing it to the libraries listed below
- KETOD Knowledge-Enriched Task-Oriented Dialogue☆32Updated 2 years ago
- Megatron LM 11B on Huggingface Transformers☆27Updated 3 years ago
- A tiny BERT for low-resource monolingual models☆31Updated 5 months ago
- Pytorch Implementation of EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks☆63Updated 3 years ago
- The Shmoop Corpus☆16Updated 4 years ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆70Updated last year
- NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)☆36Updated 3 years ago
- A library for data streaming and augmentation☆20Updated 11 months ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆21Updated last year
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale☆14Updated 3 years ago
- Implementation of the paper 'Sentence Bottleneck Autoencoders from Transformer Language Models'☆17Updated 2 years ago
- BERT models for many languages created from Wikipedia texts☆33Updated 4 years ago
- ☆28Updated 2 years ago
- This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalenc…☆53Updated 7 months ago
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning