proger / uk4b
GPT-2 Metadata Pretraining Towards Instruction Finetuning for Ukrainian
☆18Updated last year
Related projects ⓘ
Alternatives and complementary repositories for uk4b
- Фонограми та синтагми: інструменти обробки☆21Updated 9 months ago
- A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.☆15Updated 5 years ago
- ☆23Updated 2 years ago
- Simple WFST for Ukrainian ITN based on NVIDIA NeMo and Pynini☆19Updated last year
- Agent toolkit for 100 hours of speech and 10 GiB of text☆13Updated 9 months ago
- Dictionary of obscene words for Ukrainian language☆17Updated 3 years ago
- A collection of datasets for Ukrainian language☆55Updated 3 months ago
- ☆26Updated last year
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Updated 9 months ago
- A collection of links to Ukrainian language tools☆30Updated 2 years ago
- Grammar rules and dictionaries for the phonetic transcription of Russian sentences☆33Updated 3 years ago
- Training scripts for Speech-To-Text models for Ukrainian language☆34Updated last year
- Dictionary of word stresses in the Ukrainian language 🇺🇦☆19Updated last month
- LTG-Bert☆29Updated 10 months ago
- Simple python lib to tokenize texts into sentences and sentences to words. Small, fast and robust. Comes with ukrainian flavour☆60Updated last year
- Data from "Crowdsourcing of Parallel Corpora: the Case of Style Transfer for Detoxification" paper☆14Updated 3 weeks ago
- Russian coreference resolution made as simple and accessible as could be☆12Updated 2 years ago
- ☆56Updated last year
- ☆13Updated 3 years ago
- This repository provides data and code for "Vox Populi, Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription" paper.☆15Updated 3 years ago
- Adds word stress to Ukrainian texts☆45Updated last month
- Ukrainian ELECTRA model☆12Updated last year
- UNLP 2024 Shared Task on LLM instruction-tuning for Ukrainian☆13Updated 7 months ago
- Simplified recipes for preparing commonly used speech datasets, and a PyTorch-compatible Python data loader that can perform standard fea…☆15Updated last year
- T5-based (russian) text normalization☆19Updated 9 months ago
- UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.☆26Updated 2 months ago
- phone inventory library☆15Updated last year
- Code for "Error-driven Fixed-Budget ASR Personalization for Accented Speakers" in ICASSP 2021☆11Updated 3 years ago
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆90Updated 3 weeks ago
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆12Updated 3 years ago