zaemyung / sentsplit
A flexible sentence segmentation library using CRF model and regex rules
☆24Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for sentsplit
- Pytorch Implementation of EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks☆63Updated 2 years ago
- FactSumm: Factual Consistency Scorer for Abstractive Summarization☆109Updated 10 months ago
- BERT models for many languages created from Wikipedia texts☆34Updated 4 years ago
- NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)☆36Updated 3 years ago
- KETOD Knowledge-Enriched Task-Oriented Dialogue☆31Updated last year
- Pre-training BART in Flax on The Pile dataset☆20Updated 3 years ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆12Updated 11 months ago
- A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering☆16Updated last year
- ☆63Updated last month
- Megatron LM 11B on Huggingface Transformers☆27Updated 3 years ago
- Simple Questions Generate Named Entity Recognition Datasets (EMNLP 2022)☆75Updated last year
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆75Updated 2 months ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆26Updated 3 years ago
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆29Updated last year
- ☆21Updated 2 years ago
- ☆36Updated 2 years ago
- ☆37Updated last year
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆55Updated last year
- UDapter is a multilingual dependency parser that uses "contextual" adapters together with language-typology features for language-specifi…☆30Updated last year
- ☆97Updated 2 years ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆46Updated 3 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆43Updated 3 months ago
- A tiny BERT for low-resource monolingual models☆29Updated last month
- Code for our EACL-2021 paper "Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs".☆38Updated 4 months ago
- ☆28Updated 2 years ago
- Source codes of Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction☆43Updated 3 years ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆70Updated 8 months ago
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 2 months ago
- MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale☆14Updated 3 years ago
- Official implementation of the paper "IteraTeR: Understanding Iterative Revision from Human-Written Text" (ACL 2022)☆76Updated 11 months ago