sdtblck / Opensubtitles_datasetLinks
downloads and parses subtitle dataset from opensubtitles.org
☆16Updated last year
Alternatives and similar repositories for Opensubtitles_dataset
Users that are interested in Opensubtitles_dataset are comparing it to the libraries listed below
Sorting:
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆51Updated last week
- A guide to building language technology in new languages.☆58Updated 3 years ago
- A tiny BERT for low-resource monolingual models☆31Updated 9 months ago
- Automatic extraction of edited sentences from text edition histories.☆83Updated 3 years ago
- ☆90Updated 2 years ago
- ☆74Updated 3 months ago
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale☆155Updated last year
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆43Updated 4 years ago
- 🖋 Resource and Tool for Writing System Identification -- LREC 2024☆16Updated last year
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 4 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆38Updated 3 years ago
- The pipeline for the OSCAR corpus☆169Updated last year
- phone inventory library☆16Updated 2 years ago
- A python true casing utility that restores case information for texts☆89Updated 2 years ago
- German small and large versions of GPT2.☆20Updated 3 years ago
- Implementation of the GBST block from the Charformer paper, in Pytorch☆117Updated 3 years ago
- Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".☆98Updated 2 years ago
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations☆56Updated 2 years ago
- A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+☆38Updated 4 years ago
- Hidden Engrams: Long Term Memory for Transformer Model Inference☆35Updated 4 years ago
- Bicleaner fork that uses neural networks☆40Updated 2 weeks ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆48Updated 3 years ago
- 📝An easy-to-use package to restore punctuation of the text.☆116Updated 2 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆30Updated 4 months ago
- Complimentary code for our paper Automatic punctuation restoration with BERT models☆50Updated last year
- Easier Automatic Sentence Simplification Evaluation☆162Updated last year
- A flexible sentence segmentation library using CRF model and regex rules☆29Updated last year
- 🫠 check your data, before you wreck your model☆16Updated 2 years ago
- A tool that locates, downloads, and extracts machine translation corpora☆155Updated last month