MiniXC / opensubtitles-dataloaderLinks
Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.
☆13Updated 5 years ago
Alternatives and similar repositories for opensubtitles-dataloader
Users that are interested in opensubtitles-dataloader are comparing it to the libraries listed below
Sorting:
- Test prompts for GPT-J-6B and the resulting AI-generated texts☆53Updated 4 years ago
- Question Generation - Question Answering for Automatic Flashcards☆66Updated 3 years ago
- Conversational text Analysis using various NLP techniques☆181Updated 2 years ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆53Updated 4 years ago
- 🕊️ Radically lightweight command-line interfaces☆106Updated 2 weeks ago
- MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert…☆49Updated 4 years ago
- Lazy, a tool for running things in idle time☆48Updated 4 years ago
- Weird A.I. Yankovic neural-net based lyrics parody generator☆84Updated 3 years ago
- A slightly opinionated iPython profile for interactive development☆23Updated 3 years ago
- Flenser is a simple, minimal, automated exploratory data analysis tool.☆78Updated 5 months ago
- Which ML are you?☆13Updated 2 years ago
- NUBIA (NeUral Based Interchangeability Assessor) is a new SoTA evaluation metric for text generation☆53Updated 2 years ago
- An opinionated, organized way to start and manage data science experiments.☆15Updated 5 years ago
- ☆18Updated 3 years ago
- 🔤 Measure edit distance based on keyboard layout☆61Updated last year
- Vectory provides a collection of tools to track and compare embedding versions.☆71Updated 2 years ago
- ☆20Updated 4 years ago
- A utility for labeling clusters of text data.☆28Updated 4 years ago
- A corpus of Python programs annotated with contracts☆24Updated 3 years ago
- Efficiently computing & storing token n-grams from large corpora☆26Updated 11 months ago
- Babysit your preemptible TPUs☆86Updated 2 years ago
- Experiments with generating GPT-2 fanfiction on specified topics.☆11Updated 6 years ago
- Abydos NLP/IR library for Python☆190Updated 2 years ago
- A python module for word inflections designed for use with spaCy.☆93Updated 5 years ago
- That Metric Timeline (TMT) is a Python library aimed at the machine/deep learning practitioner/researcher. It helps tracking experiments,…☆21Updated 2 years ago
- Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.☆27Updated 3 years ago
- A minimal Python kernel so you can run Python in your Python☆39Updated 3 years ago
- Benchmark scripts for comparing different tokenizers and sentence segmenters of German☆12Updated 2 years ago
- Lightning Fast Language Prediction 🚀☆167Updated last month
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago