ad-freiburg / whitespace-correction
Fast whitespace correction with Transformers
☆15Updated 9 months ago
Alternatives and similar repositories for whitespace-correction:
Users that are interested in whitespace-correction are comparing it to the libraries listed below
- A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation (INTERSPEECH 2022)☆20Updated 7 months ago
- Vietnamese Punctuation Prediction using Pretrained Language Models☆13Updated 2 years ago
- Transformation spoken text to written text☆30Updated 9 months ago
- Whisper finetuned on VinBigdata-VLSP2020-100h + KenLM☆36Updated last year
- zero-vocab or low-vocab embeddings☆18Updated 2 years ago
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback☆92Updated last year
- This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Sp…☆12Updated last year
- BERT-based joint intent detection and slot filling with intent-slot attention mechanism (INTERSPEECH 2021)☆85Updated 7 months ago
- This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalenc…☆53Updated 6 months ago
- Correction of spaces with character-based neural language models.☆13Updated 2 years ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆23Updated 3 years ago
- Implementation of Z-BERT-A: a zero-shot pipeline for unknown intent detection.☆39Updated last year
- Finetune multiple pre-trained Transformer-based models to solve Vietnamese Fake News Detection problem (ReINTEL) in VLSP2020 shared task☆18Updated 4 years ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆70Updated 11 months ago
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- Repository containing the open source code of works published at the FBK MT unit.☆42Updated 3 weeks ago
- A tiny BERT for low-resource monolingual models☆31Updated 4 months ago
- This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The tex…☆51Updated 4 years ago
- Matching The Statements: A Simple and Accurate Model for Key Point Analysis (ArgMining | EMNLP 2021)☆12Updated 3 years ago
- ☆34Updated 4 years ago
- End-to-End Vietnamese Speech Recognition using wav2vec 2.0☆96Updated 3 years ago
- Robust Cross-lingual Embeddings from Parallel Sentences☆21Updated 4 years ago
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Updated last year
- Library for pruning experts per language pair in NLLB-200☆32Updated last year
- We finetune Bloomz-7b1-mt using LoRA with the chatdoctor-200k dataset at here https://huggingface.co/LinhDuong/doctorwithbloomz-7b1-mt an…☆30Updated last year
- A Robustly Optimized BERT Pretraining Approach for Vietnamese☆31Updated 6 months ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- Showcasing various NLP Downstream tasks Training with pre-trained Language models using Pytorch Lightning☆13Updated 2 years ago
- ☆43Updated 2 years ago