zaemyung / sentsplit
A flexible sentence segmentation library using CRF model and regex rules
☆29Updated last year
Alternatives and similar repositories for sentsplit:
Users that are interested in sentsplit are comparing it to the libraries listed below
- ☆29Updated 2 years ago
- BERT models for many languages created from Wikipedia texts☆33Updated 4 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆21Updated last year
- Megatron LM 11B on Huggingface Transformers☆27Updated 3 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆30Updated 2 months ago
- ☆44Updated 4 years ago
- KETOD Knowledge-Enriched Task-Oriented Dialogue☆32Updated 2 years ago
- Implementation of the paper 'Sentence Bottleneck Autoencoders from Transformer Language Models'☆17Updated 3 years ago
- A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering☆16Updated 2 years ago
- ☆20Updated 2 years ago
- ☆46Updated 3 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆55Updated 2 years ago
- Can LLMs generate code-mixed sentences through zero-shot prompting?