NusaWrites is an in-depth analysis of corpora collection strategy and a comprehensive language modeling benchmark for underrepresented and extremely low-resource Indonesian local languages.
☆28Sep 27, 2024Updated last year
Alternatives and similar repositories for nusa-writes
Users that are interested in nusa-writes are comparing it to the libraries listed below
Sorting:
- A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.☆95Jan 24, 2025Updated last year
- ☆13Sep 6, 2022Updated 3 years ago
- A living document for all things Common Voice.☆14Jun 24, 2024Updated last year
- ☆17Dec 12, 2024Updated last year
- Benchmarking Multidomain English-Indonesian Machine Translation☆16Dec 19, 2020Updated 5 years ago
- The first-ever vast natural language generation benchmark for Indonesian, Sundanese, and Javanese. We provide multiple downstream tasks, …☆78Nov 16, 2024Updated last year
- Visualize constituent and dependency parses as PDF or image formats, through GraphViz.☆32Feb 11, 2021Updated 5 years ago
- Official reposity for paper "High-Dimension Human Value Representation in Large Language Models" (NAACL'25 Main)☆23Jul 9, 2024Updated last year
- A collaborative project to collect datasets in Indonesian languages.☆279Jun 2, 2024Updated last year
- ☆23Aug 7, 2023Updated 2 years ago
- High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper…☆110May 8, 2023Updated 2 years ago
- The code implementation of the EMNLP2022 paper: DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Gene…☆27Nov 13, 2023Updated 2 years ago
- Twpipe is a pipeline toolkit that parses raw tweets into universal dependencies.☆28Apr 24, 2019Updated 6 years ago
- The implementation of "Neural Machine Translation without Embeddings", NAACL 2021☆33Jun 9, 2021Updated 4 years ago
- ☆35Jun 15, 2023Updated 2 years ago
- Continual Resilient (CoRe) Optimizer for PyTorch☆11Jun 10, 2024Updated last year
- This repository is about how to build an SQLite version of the Arabic WordNet database.☆10Mar 19, 2019Updated 6 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP research☆34Dec 8, 2022Updated 3 years ago
- Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21)☆41Jun 20, 2021Updated 4 years ago
- Download, and scrape bilibili.tv OR BStation Videos with free, and easy.☆11Jul 22, 2023Updated 2 years ago
- MG top-down beam parsing☆13Jul 2, 2018Updated 7 years ago
- scrape web content into readable markdown for llms and human readers☆10Feb 19, 2024Updated 2 years ago
- A tool to collect/validate audio recordings from workers on Amazon Mechanical Turk. Written in Python/Flask. (originally hosted on github…☆14Dec 19, 2022Updated 3 years ago
- ☆16Updated this week
- Named Entity (NER) annotations of the Hebrew Treebank (Haaretz newspaper) corpus, including: morpheme and token level NER labels, nested …☆10Dec 27, 2021Updated 4 years ago
- ☆38Apr 17, 2024Updated last year
- ☆44Nov 17, 2024Updated last year
- A Multilingual Replicable Instruction-Following Model☆96Jun 11, 2023Updated 2 years ago
- The Grammar Matrix☆15Jan 22, 2026Updated last month
- ☆11Sep 8, 2024Updated last year
- Discourse Probing of Pretrained Language Models. In Proceedings of NAACL 2021.☆10Jun 27, 2022Updated 3 years ago
- A python library for easily querying morphological inflection models trained on Unimorph☆13Oct 23, 2022Updated 3 years ago
- ☆10Oct 28, 2019Updated 6 years ago
- Official repo for ACL 2023 paper Code4Struct: Code Generation for Few-Shot Structured Prediction from Natural Language.☆43Jan 7, 2024Updated 2 years ago
- GUI applikation for the Klatt formant synthesizer package☆11Feb 16, 2026Updated 2 weeks ago
- Supplementary materials for "Evaluating generalised additive mixed modelling strategies for dynamic speech analysis"☆10Jan 25, 2021Updated 5 years ago
- This Node.js script automates the process of downloading and extracting source maps from websites. It uses Puppeteer to navigate web page…☆18Dec 17, 2025Updated 2 months ago
- Modularized version of the Pink Trombone voice synthesizer☆12May 5, 2019Updated 6 years ago
- Vector Symbolic Architecture library☆11Updated this week