Finite-state script normalization and processing utilities
☆46Feb 25, 2026Updated last week
Alternatives and similar repositories for nisaba
Users that are interested in nisaba are comparing it to the libraries listed below
Sorting:
- Read-only unofficial mirror of OpenFst☆44May 15, 2022Updated 3 years ago
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆11Feb 6, 2024Updated 2 years ago
- A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts☆16Dec 3, 2024Updated last year
- 🕸 GlotWeb: Web Indexing for Minority Languages (WWW 2026)☆17Updated this week
- GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific way☆18Nov 4, 2025Updated 4 months ago
- Forced alignment decoder for Whisper.☆14Mar 13, 2024Updated last year
- The website of the Oscar Project☆11Mar 27, 2025Updated 11 months ago
- This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and appl…☆19Jun 5, 2025Updated 9 months ago
- Read-only unofficial mirror of the OpenGrm Thrax Grammar Development Tools☆16May 2, 2019Updated 6 years ago
- 🕸 GlotCC Dataset and Pipline -- NeurIPS 2024☆20Apr 6, 2025Updated 10 months ago
- POS for African languages☆19Jun 25, 2025Updated 8 months ago
- 🖋 Resource and Tool for Writing System Identification (Unicode 17.0) -- LREC 2024☆21Feb 17, 2026Updated 2 weeks ago
- MozoLM: A language model (LM) serving library☆48Updated this week
- Compiled tools, datasets, and other resources for historical text normalization.☆20Jun 18, 2019Updated 6 years ago
- This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to…☆45May 25, 2021Updated 4 years ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resource…☆26Feb 16, 2026Updated 2 weeks ago
- Source stories from the African Storybook Project in Markdown format☆22Jan 25, 2026Updated last month
- Random notes on Python internationalisation☆19Aug 10, 2023Updated 2 years ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆106Apr 20, 2024Updated last year
- Code for "BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition"☆32Jun 20, 2023Updated 2 years ago
- Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark☆35May 7, 2025Updated 9 months ago
- A database of number names for 186 languages, locales, and scripts☆67Mar 3, 2023Updated 3 years ago
- Collection of auditory models.☆33Feb 4, 2024Updated 2 years ago
- Targetted language identifier, based on FastText and Hunspell.☆38Sep 4, 2025Updated 6 months ago
- Simple typescript, node, yarn, serverless framework, scaffold with husky, eslint, and prettier☆18Sep 12, 2023Updated 2 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Apr 2, 2022Updated 3 years ago
- Implementation of Google's USM speech model in Pytorch☆35Feb 7, 2026Updated 3 weeks ago
- Cantonese Text to Speech with VITS implementation☆37Apr 8, 2023Updated 2 years ago
- it's ASR decoder and make graph project☆33May 26, 2022Updated 3 years ago
- A simple neural truecaser written in pytorch and allennlp.☆33Jun 17, 2024Updated last year
- ☆32Sep 27, 2021Updated 4 years ago
- Generative and Parametric design code: featuring Processing / Python / Javascript / HTML / CSS☆14Nov 4, 2020Updated 5 years ago
- Chat with your data while uploading a pdf file and using a local LLM.☆11Mar 19, 2024Updated last year
- Exploratory Data Analysis of Time Series Data and Forecasting using Naïve Approach, Moving Average Method, Simple Exponential Smoothenin…☆12Jul 2, 2018Updated 7 years ago
- Cache surefire/failsafe at scale☆16Updated this week
- English-Chinese-Japanese translation dataset of the terms in Genshin Impact☆39Feb 25, 2026Updated last week
- ☆13Nov 21, 2025Updated 3 months ago
- ☆10Feb 2, 2021Updated 5 years ago
- A parallel evaluation data set of SAP software documentation with document structure annotation☆14Jul 30, 2025Updated 7 months ago