IndoNLP/nusa-writes

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/IndoNLP/nusa-writes)

IndoNLP / nusa-writes

NusaWrites is an in-depth analysis of corpora collection strategy and a comprehensive language modeling benchmark for underrepresented and extremely low-resource Indonesian local languages.

☆30

Alternatives and similar repositories for nusa-writes

Users that are interested in nusa-writes are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HLTCHKUST / UniVaR
View on GitHub
Official reposity for paper "High-Dimension Human Value Representation in Large Language Models" (NAACL'25 Main)
☆23Jul 9, 2024Updated 2 years ago
JRMeyer / common-voice-stats
View on GitHub
A living document for all things Common Voice.
☆14Jun 24, 2024Updated 2 years ago
IndoNLP / indonlg
View on GitHub
The first-ever vast natural language generation benchmark for Indonesian, Sundanese, and Javanese. We provide multiple downstream tasks, …
☆82Nov 16, 2024Updated last year
fajri91 / minangNLP
View on GitHub
Minangkabau NLP corpus. PACLIC 2020
☆11Jun 7, 2021Updated 5 years ago
fajri91 / NeuralRST-TopDown
View on GitHub
EACL 2021
☆11May 4, 2021Updated 5 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Wikidepia / indonesian_datasets
View on GitHub
NLP Datasets for Indonesian
☆128Apr 25, 2026Updated 2 months ago
IndoNLP / cendol
View on GitHub
Indonesian T0 | Instruction-tuning for low-resource and extremely low-resource Austronesian languages
☆18Jun 24, 2024Updated 2 years ago
fajri91 / discourse_probing
View on GitHub
Discourse Probing of Pretrained Language Models. In Proceedings of NAACL 2021.
☆10Jun 27, 2022Updated 4 years ago
UKPLab / maps
View on GitHub
Multicultural Proverbs and Sayings
☆13Jan 11, 2025Updated last year
The-Gupta / TED-Scraper
View on GitHub
Complete Web Scraping of TED.com for Metadata, Transcript, Audio, Video, Images using Parallel Programming
☆11Jun 25, 2020Updated 6 years ago
HLTCHKUST / KnowExpert
View on GitHub
The implementation of the paper "Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters".
☆17May 24, 2022Updated 4 years ago
haryoa / indo-collex
View on GitHub
Welcome to our repository! This repository hosts the data on "IndoCollex: A Testbed for Morphological Transformation of Indonesian Word …
☆24Aug 10, 2021Updated 4 years ago
indonesian-nlp / multilingual-asr
View on GitHub
Multilingual Speech Recognition for Indonesian Languages
☆72Oct 5, 2022Updated 3 years ago
monicamanda / twitter-personality-classification
View on GitHub
Classification of twitter user's personality based on their tweets. Big Five Model used to classify the personality.
☆15Aug 30, 2020Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
tongzhou21 / Oasis
View on GitHub
☆23Aug 7, 2023Updated 2 years ago
mbzuai-nlp / bactrian-x
View on GitHub
A Multilingual Replicable Instruction-Following Model
☆96Jun 11, 2023Updated 3 years ago
littlehacker26 / Discriminator-Cooperative-Unlikelihood-Prompt-Tuning
View on GitHub
The code implementation of the EMNLP2022 paper: DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Gene…
☆27Nov 13, 2023Updated 2 years ago
ehsanasgari / 1000Langs
View on GitHub
Creating super-parallel corpora of more than 1500+ unique languages for NLP research
☆33Dec 8, 2022Updated 3 years ago
LAION-AI / Anh
View on GitHub
Anh - LAION's multilingual assistant datasets and models
☆28Apr 5, 2023Updated 3 years ago
lemaoliu / retrieval-generation-tutorial
View on GitHub
☆11Jun 19, 2022Updated 4 years ago
ryanzhumich / sparc_atis_pytorch
View on GitHub
☆10Oct 28, 2019Updated 6 years ago
zliucr / Crosslingual-NLU
View on GitHub
Zero-shot Cross-lingual Task-Oriented Dialogue Systems (EMNLP 2019)
☆24Nov 9, 2019Updated 6 years ago
sail-sg / sailcraft
View on GitHub
🚢 Data Toolkit for Sailor Language Models
☆94Feb 24, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
MTTeql / MT-Teql
View on GitHub
Research Artifact For Our Submission To VLDB
☆11Oct 27, 2021Updated 4 years ago
freesunshine0316 / sembleu
View on GitHub
SemBleu: A Robust Metric for AMR Parsing Evaluation
☆12Feb 22, 2021Updated 5 years ago
sebastianruder / emnlp2021-multiqa-tutorial
View on GitHub
EMNLP 2021 Tutorial: Multi-Domain Multilingual Question Answering
☆38Nov 7, 2021Updated 4 years ago
leuchine / self_play_picard
View on GitHub
Using self-play to augment multi-turn text-to-SQL datasets
☆12Oct 20, 2022Updated 3 years ago
NJUNLP / MMT-LLM
View on GitHub
☆36Jun 15, 2023Updated 3 years ago
zhongwanjun / CARP
View on GitHub
code for the table-based open domain question answering project, with paper title: "Reasoning over Hybrid Chain for Table-and-Text Open D…
☆12Sep 16, 2022Updated 3 years ago
Hi-ZenanXu / Syntax-Enhanced_Pre-trained_Model
View on GitHub
Source Data of ACL2021 paper "Syntax-Enhanced Pre-trained Model"
☆11Jun 1, 2021Updated 5 years ago
zzshou / amr-data-augmentation
View on GitHub
Code for our paper "AMR-DA: Data augmentation by abstract meaning representation" in ACL 2022
☆13May 17, 2022Updated 4 years ago
mahadi-nahid / TabSQLify
View on GitHub
[NAACL 2024] TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition
☆18Jan 5, 2026Updated 6 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
fajri91 / IndoMMLU
View on GitHub
☆41Oct 10, 2023Updated 2 years ago
shizhediao / automate-cot
View on GitHub
Source code for the paper "Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data"
☆20Feb 24, 2024Updated 2 years ago
Rojak-NLP / LLM-Code-Mixing
View on GitHub
Can LLMs generate code-mixed sentences through zero-shot prompting?
☆11Apr 18, 2023Updated 3 years ago
itayle / diverse-demonstrations
View on GitHub
Diverse Demonstrations Improve In-context Compositional Generalization
☆13Jul 7, 2023Updated 3 years ago
d223302 / A-Closer-Look-To-LLM-Evaluation
View on GitHub
Code for EMNLP 2023 findings paper "A Closer Look into Using Large Language Models for Automatic Evaluation"
☆19Oct 9, 2023Updated 2 years ago
princeton-nlp / Cognac
View on GitHub
Repo for paper: Controllable Text Generation with Language Constraints
☆20Jun 20, 2023Updated 3 years ago
AIM3-RUC / MPMQA
View on GitHub
Official repository of the paper MPMQA: Multimodal Question Answering on Product Manuals (AAAI 2023)
☆21Nov 28, 2022Updated 3 years ago