downloads and parses subtitle dataset from opensubtitles.org
β15Apr 19, 2024Updated 2 years ago
Alternatives and similar repositories for Opensubtitles_dataset
Users that are interested in Opensubtitles_dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Script for downloading GitHub.β13Sep 24, 2020Updated 5 years ago
- [LREC 2024] π Resource and Tool for Writing System Identificationβ21Mar 29, 2026Updated last month
- Downloads 2020 English Wikipedia articles as plaintextβ27Mar 25, 2023Updated 3 years ago
- my configuration filesβ14Nov 16, 2025Updated 5 months ago
- URL downloader supporting checkpointing and continuous checksumming.β19Nov 29, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- StyleGAN2 - Official TensorFlow Implementationβ12Jul 15, 2020Updated 5 years ago
- phone inventory libraryβ17May 15, 2023Updated 2 years ago
- Dataset of Canada goose images with annotations of bounding boxes with object classes, suitable for testing object detection algorithms.β40Aug 2, 2018Updated 7 years ago
- A simple, minimalist writing theme for Typoraβ15Jan 20, 2026Updated 3 months ago
- A TinyStories LM with SAEs and transcodersβ14Apr 3, 2025Updated last year
- Extensible DL-based automatic Arabic diacritization tool allowing the restoration of different types of diacritics.β22Jul 25, 2023Updated 2 years ago
- Remove generated stories with stray unicode charactersβ12Jan 3, 2024Updated 2 years ago
- β28Nov 28, 2024Updated last year
- β95Jul 16, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- β21Oct 20, 2022Updated 3 years ago
- An extension of thu-spmi/CAT which contains a full-fledged implementation of CTC-CRF for Tensorflow.β12Jul 5, 2021Updated 4 years ago
- Rudimentary snippets facility for ScIDE, implemented in sclangβ13Oct 20, 2022Updated 3 years ago
- SemEval 2020 task 10 datasetsβ17Feb 19, 2020Updated 6 years ago
- Helper to use Plotly in SvelteKitβ18Jul 12, 2022Updated 3 years ago
- The case study and multilingfual performance of ICASSP submissionβ24Sep 24, 2022Updated 3 years ago
- MaltParser for Russianβ12Mar 10, 2019Updated 7 years ago
- Some mathematical extensions to SuperColliderβ16Jul 19, 2023Updated 2 years ago
- β19Jan 28, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)β36Jun 29, 2025Updated 10 months ago
- Python tools for processing the stackexchange data dumps into a text dataset for Language Modelsβ87Dec 6, 2023Updated 2 years ago
- Visual Hash for matching copies of visually similar images.β16Mar 17, 2025Updated last year
- simple kv store for streamsβ36Mar 14, 2013Updated 13 years ago
- Using queues, tqdm-multiprocess supports multiple worker processes, each with multiple tqdm progress bars, displaying them cleanly througβ¦β43Jan 6, 2021Updated 5 years ago
- β33May 23, 2023Updated 2 years ago
- MIDict (Multi-Index Dict) can be indexed by any "keys" or "values", suitable as a bidirectional/inverse dict or a multi-key/multi-value dβ¦β14May 19, 2016Updated 9 years ago
- Precise type-checker for JavaScriptβ11Oct 23, 2025Updated 6 months ago
- A utility to read and write PDFs with Pythonβ12Apr 28, 2022Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Learned string similarity for entity names using optimal transport.β35Nov 17, 2020Updated 5 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP researchβ34Dec 8, 2022Updated 3 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.β42Apr 5, 2022Updated 4 years ago
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model trainingβ46Sep 22, 2020Updated 5 years ago
- Python wrapper for Google's syntaxnetβ15Apr 8, 2019Updated 7 years ago
- Dockerized version of Google's SyntaxNet Parser and POS tagger for Russian + standalone server.β16May 4, 2017Updated 9 years ago
- Character-level conversion between Hebrew text and Latin transliteration using deep learning - a demonstration of seq2seq training.β14Jun 27, 2023Updated 2 years ago