MiniXC / opensubtitles-dataloaderLinks
Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.
☆13Updated 5 years ago
Alternatives and similar repositories for opensubtitles-dataloader
Users that are interested in opensubtitles-dataloader are comparing it to the libraries listed below
Sorting:
- spaCy match and replace, maintaining conjugation☆36Updated 2 years ago
- Conversational text Analysis using various NLP techniques☆182Updated 2 years ago
- Test prompts for GPT-J-6B and the resulting AI-generated texts☆53Updated 4 years ago
- MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert…☆50Updated 4 years ago
- A utility for labeling clusters of text data.☆28Updated 4 years ago
- Question Generation - Question Answering for Automatic Flashcards☆66Updated 3 years ago
- A python module for word inflections designed for use with spaCy.☆93Updated 5 years ago
- 🕊️ Radically lightweight command-line interfaces☆109Updated 2 months ago
- Lazy, a tool for running things in idle time☆48Updated 4 years ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆56Updated 4 years ago
- ☆70Updated 2 years ago
- A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any othe…☆68Updated 3 years ago
- Vectory provides a collection of tools to track and compare embedding versions.☆71Updated 2 years ago
- Abydos NLP/IR library for Python☆192Updated 3 years ago
- Weird A.I. Yankovic neural-net based lyrics parody generator☆84Updated 3 years ago
- Confection: the sweetest config system for Python☆191Updated 2 weeks ago
- A slightly opinionated iPython profile for interactive development☆23Updated 3 years ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆77Updated 3 years ago
- Efficient teacher-student models and scripts to make them☆52Updated last year
- NoPdb: Non-interactive Python Debugger☆84Updated 3 years ago
- ☆18Updated 3 years ago
- Common Voice Dataset explorer☆27Updated 3 years ago
- A lightweight Python library for constructing, processing, and visualizing constituent trees.☆68Updated this week
- 🔤 Measure edit distance based on keyboard layout☆61Updated last month
- Topic Inference with Zeroshot models☆61Updated 2 years ago
- ☆17Updated last year
- Tooling to play around with multilingual machine translation for Indian Languages.☆22Updated 3 years ago
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.☆111Updated 5 months ago
- A python package to simulate typographical errors.☆38Updated last year
- Loadable spellfix1 extension for sqlite as python package☆26Updated last year