MiniXC / opensubtitles-dataloaderLinks
Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.
☆13Updated 5 years ago
Alternatives and similar repositories for opensubtitles-dataloader
Users that are interested in opensubtitles-dataloader are comparing it to the libraries listed below
Sorting:
- MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert…☆50Updated 4 years ago
- A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any othe…☆68Updated 3 years ago
- Conversational text Analysis using various NLP techniques☆182Updated 2 years ago
- A python module for word inflections designed for use with spaCy.☆93Updated 5 years ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆57Updated 4 years ago
- Test prompts for GPT-J-6B and the resulting AI-generated texts☆53Updated 4 years ago
- spaCy match and replace, maintaining conjugation☆36Updated 3 years ago
- ☆70Updated 3 years ago
- Abydos NLP/IR library for Python☆194Updated 3 years ago
- 🕊️ Radically lightweight command-line interfaces☆108Updated 4 months ago
- Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.☆27Updated 4 years ago
- Question Generation - Question Answering for Automatic Flashcards☆66Updated 3 years ago
- 🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library☆103Updated 6 months ago
- The world's largest social media toxicity dataset.☆189Updated 3 years ago
- 🔎 A Prodigy plugin for evaluating spaCy pipelines☆13Updated last year
- Lazy, a tool for running things in idle time☆48Updated 4 years ago
- A utility for labeling clusters of text data.☆28Updated 4 years ago
- Extremely easy to use sequence to sequence library with attention, for text to text conversion tasks.☆39Updated 5 years ago
- Efficiently computing & storing token n-grams from large corpora☆26Updated last year
- Using queues, tqdm-multiprocess supports multiple worker processes, each with multiple tqdm progress bars, displaying them cleanly throug…☆42Updated 5 years ago
- 🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy☆331Updated 9 months ago
- Loadable spellfix1 extension for sqlite as python package☆27Updated last year
- Fast SymSpell written in c++ and exposes to python via pybind11☆44Updated 10 months ago
- Cleaning tool for web scraped text☆38Updated 2 years ago
- 🔤 Measure edit distance based on keyboard layout☆63Updated 3 months ago
- Testing various image matching algorithms' performance on the Pinecone vector DB☆43Updated 2 years ago
- Finds linguistic patterns effortlessly☆39Updated 2 years ago
- Experiments with generating GPT-2 fanfiction on specified topics.☆11Updated 6 years ago
- Visual Automata is a Python 3 library built as a wrapper for the Automata library to add more visualization features.☆57Updated 2 years ago
- Custom Natural Language Processing with big and small models 🌲🌱☆66Updated 4 years ago