zaemyung / sentsplit
A flexible sentence segmentation library using CRF model and regex rules
☆28Updated 10 months ago
Alternatives and similar repositories for sentsplit:
Users that are interested in sentsplit are comparing it to the libraries listed below
- Pytorch Implementation of EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks☆63Updated 2 years ago
- The Shmoop Corpus☆16Updated 4 years ago
- A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering☆16Updated 2 years ago
- ☆28Updated 2 years ago
- KETOD Knowledge-Enriched Task-Oriented Dialogue☆32Updated 2 years ago
- FactSumm: Factual Consistency Scorer for Abstractive Summarization☆110Updated last year
- ☆20Updated 2 years ago
- ☆36Updated 2 years ago
- Megatron LM 11B on Huggingface Transformers☆27Updated 3 years ago
- Codebase for public release of the plug-and-blend framework.☆22Updated 2 years ago
- A Benchmark Dataset for Understanding Disfluencies in Question Answering☆62Updated 3 years ago
- Code for EMNLP 2021 paper: Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting☆17Updated 3 years ago
- ☆13Updated 3 years ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆76Updated 4 months ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆48Updated 3 years ago
- BERT models for many languages created from Wikipedia texts☆34Updated 4 years ago
- MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale☆14Updated 3 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 5 months ago
- CoNLL 2005 SRL (Semantic Role Labeling) evaluation script, implemented in Python☆8Updated 6 years ago
- ✅ How Robust are Fact Checking Systems on Colloquial Claims?. In NAACL-HLT, 2021.☆23Updated 3 years ago
- ☆43Updated 4 years ago
- Official implementation of the paper "IteraTeR: Understanding Iterative Revision from Human-Written Text" (ACL 2022)☆78Updated last year
- Source codes of Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction☆43Updated 3 years ago
- NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)☆36Updated 3 years ago
- A tiny BERT for low-resource monolingual models☆31Updated 3 months ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 2 years ago
- Pre-training BART in Flax on The Pile dataset☆20Updated 3 years ago
- The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)☆52Updated 2 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆21Updated last year
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations☆54Updated 2 years ago