IndoNLP / nusa-writesLinks
NusaWrites is an in-depth analysis of corpora collection strategy and a comprehensive language modeling benchmark for underrepresented and extremely low-resource Indonesian local languages.
☆27Updated last year
Alternatives and similar repositories for nusa-writes
Users that are interested in nusa-writes are comparing it to the libraries listed below
Sorting:
- A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.☆94Updated last year
- A curated list of research papers and resources on Indonesian languages☆40Updated last year
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 3 years ago
- ☆52Updated 2 years ago
- XL-AMR is a sequence-to-graph cross-lingual AMR parser that exploits transfer learning (EMNLP2020).☆17Updated last year
- Experiments for XLM-V Transformers Integeration☆13Updated 2 years ago
- Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".☆99Updated 2 years ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆74Updated last year
- NTREX -- News Test References for MT Evaluation☆87Updated last year
- benchmarks for evaluating MT models☆12Updated last year
- The first large-scale summarization corpus for the Indonesian language. AACL 2020.☆38Updated 4 years ago
- Code and Data for the ACL 2022 paper "Rethinking Self-Supervision Objectives for Generalizable Coherence Modeling"☆11Updated 3 years ago
- A tiny BERT for low-resource monolingual models☆31Updated last month
- Multilingual Open Text☆25Updated 8 months ago
- https://liuzeming01.github.io/XDailyDialog/☆12Updated 2 years ago
- Code and data for the paper "Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?"☆25Updated 7 months ago
- COMET for African languages☆10Updated last year
- Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to E…☆25Updated 3 years ago
- The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)☆53Updated 3 years ago
- A survey of corpora for Germanic low-resource languages and dialects☆26Updated last year
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 3 years ago
- ☆17Updated 3 years ago
- Official Implementation of "DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization."☆143Updated 3 years ago
- Multilingual abstractive summarization dataset extracted from WikiHow.☆99Updated 10 months ago
- Code for the EMNLP 2020 paper titled "Chapter Captor: Text Segmentation in Novels"☆30Updated 5 years ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆87Updated last year
- Framework for unified summarisation and evaluation of English documents using state-of-the-art models and measures.☆33Updated last year
- ☆18Updated last year
- Code & Data for Comparative Opinion Summarization via Collaborative Decoding (Iso et al; Findings of ACL 2022)☆23Updated 10 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆107Updated last year