sdtblck/Opensubtitles_dataset

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sdtblck/Opensubtitles_dataset)

sdtblck / Opensubtitles_dataset

downloads and parses subtitle dataset from opensubtitles.org

☆15

Alternatives and similar repositories for Opensubtitles_dataset

Users that are interested in Opensubtitles_dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

noanabeshima / github-downloader
View on GitHub
Script for downloading GitHub.
☆13Sep 24, 2020Updated 5 years ago
noanabeshima / wikipedia-downloader
View on GitHub
Downloads 2020 English Wikipedia articles as plaintext
☆27Mar 25, 2023Updated 3 years ago
EleutherAI / best-download
View on GitHub
URL downloader supporting checkpointing and continuous checksumming.
☆19Nov 29, 2023Updated 2 years ago
dmort27 / HsSPE
View on GitHub
Haskell phonology library.
☆10Jan 23, 2012Updated 14 years ago
sdtblck / stylegan2
View on GitHub
StyleGAN2 - Official TensorFlow Implementation
☆12Jul 15, 2020Updated 6 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
thoppe / The-Pile-FreeLaw
View on GitHub
Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.
☆16Jun 3, 2023Updated 3 years ago
steggie3 / goose-dataset
View on GitHub
Dataset of Canada goose images with annotations of bounding boxes with object classes, suitable for testing object detection algorithms.
☆41Aug 2, 2018Updated 7 years ago
noanabeshima / tinymodel
View on GitHub
A TinyStories LM with SAEs and transcoders
☆14Apr 3, 2025Updated last year
ad8e / TinyStories-cleaner
View on GitHub
Remove generated stories with stray unicode characters
☆12Jan 3, 2024Updated 2 years ago
EleutherAI / openwebtext2
View on GitHub
☆94Jul 16, 2022Updated 4 years ago
TeaPoly / warp-ctc-crf
View on GitHub
An extension of thu-spmi/CAT which contains a full-fledged implementation of CTC-CRF for Tensorflow.
☆12Jul 5, 2021Updated 5 years ago
RiTUAL-MBZUAI / SemEval2020_Task10_Emphasis_Selection
View on GitHub
SemEval 2020 task 10 datasets
☆17Feb 19, 2020Updated 6 years ago
Yaoming95 / UniPunc
View on GitHub
The case study and multilingfual performance of ICASSP submission
☆24Sep 24, 2022Updated 3 years ago
shawwn / stylegan2
View on GitHub
StyleGAN2 - Official TensorFlow Implementation
☆25Sep 5, 2020Updated 5 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
EleutherAI / stackexchange-dataset
View on GitHub
Python tools for processing the stackexchange data dumps into a text dataset for Language Models
☆87Dec 6, 2023Updated 2 years ago
mingruimingrui / ICU-tokenizer
View on GitHub
ICU based universal language tokenizer
☆34Jan 19, 2022Updated 4 years ago
kbatsuren / wiktra
View on GitHub
Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)
☆37Jun 29, 2025Updated last year
GoFigure-LANL / VisHash
View on GitHub
Visual Hash for matching copies of visually similar images.
☆16Mar 17, 2025Updated last year
ShenggaoZhu / midict
View on GitHub
MIDict (Multi-Index Dict) can be indexed by any "keys" or "values", suitable as a bidirectional/inverse dict or a multi-key/multi-value d…
☆14May 19, 2016Updated 10 years ago
alvations / myth
View on GitHub
Myanmar and Thai Language Resources
☆10Jul 18, 2022Updated 4 years ago
conradj / pocket-public-archive
View on GitHub
statically generated weekly digest of articles read in Pocket
☆10May 14, 2019Updated 7 years ago
briankoser / web-typography-css
View on GitHub
A stylesheet based on Richard Rutter's book Web Typography.
☆10Dec 6, 2018Updated 7 years ago
sylvainpelissier / PyPDF2
View on GitHub
A utility to read and write PDFs with Python
☆12Apr 28, 2022Updated 4 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
lucy3 / whos_filtered
View on GitHub
☆15Oct 4, 2024Updated last year
morrisalp / taatiknet
View on GitHub
Character-level conversion between Hebrew text and Latin transliteration using deep learning - a demonstration of seq2seq training.
☆16Jun 27, 2023Updated 3 years ago
lorenza12 / English-Words-Definitions-and-Parts-of-Speech
View on GitHub
A text file containing English words, along with the definition, parts of speech (noun,verb,adjective,etc.), and a link to the url where …
☆13Apr 27, 2024Updated 2 years ago
fabiospampinato / is
View on GitHub
The definitive collection of is* functions for runtime type checking. Lodash-compatible, tree-shakable, with types.
☆17Jul 17, 2026Updated last week
nlp-compromise / thumb
View on GitHub
generate rules from lists of words
☆16Jul 9, 2021Updated 5 years ago
fajri91 / minangNLP
View on GitHub
Minangkabau NLP corpus. PACLIC 2020
☆11Jun 7, 2021Updated 5 years ago
nishansubedi / fastText
View on GitHub
Library for fast text representation and classification.
☆10Apr 17, 2022Updated 4 years ago
ahhhh6980 / colortypes
View on GitHub
An abstract, safe, and concise color conversion library for rust nightly This requires the feature adt_const_params
☆12Nov 18, 2022Updated 3 years ago
IINemo / docker-syntaxnet_rus
View on GitHub
Dockerized version of Google's SyntaxNet Parser and POS tagger for Russian + standalone server.
☆16May 4, 2017Updated 9 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
interscript / rababa
View on GitHub
Rababa, the diacritization library for Arabic and Hebrew (Abjad scripts in general)
☆13May 1, 2025Updated last year
binarypearl / beepbeep
View on GitHub
A menu and CLI based console program to play and write songs for the PC Speaker
☆15Aug 1, 2019Updated 6 years ago
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
eligugliotta / tarc
View on GitHub
Tunisian Arabish Corpus
☆12Mar 12, 2024Updated 2 years ago
MathieuLoutre / node-symspell
View on GitHub
JavaScript port of SymSpell for Node.js
☆13Sep 30, 2022Updated 3 years ago
hltcoe / gazetteer-collection
View on GitHub
☆12Mar 31, 2020Updated 6 years ago
languagetool-org / german-pos-dict
View on GitHub
German part-of-speech dictionary
☆47Sep 6, 2023Updated 2 years ago