MiniXC / opensubtitles-dataloaderLinks
Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.
☆13Updated 4 years ago
Alternatives and similar repositories for opensubtitles-dataloader
Users that are interested in opensubtitles-dataloader are comparing it to the libraries listed below
Sorting:
- MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert…☆49Updated 4 years ago
- Execute arbitrary SQL queries on 🤗 Datasets☆32Updated last year
- Cortex-compatible model server for Python and TensorFlow☆17Updated 2 years ago
- Language detection using Spacy and Fasttext☆55Updated last year
- Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)☆61Updated last year
- A slightly opinionated iPython profile for interactive development☆23Updated 3 years ago
- Grammar Induction using a Template Tree Approach☆46Updated last month
- xfspell — the Transformer Spell Checker☆190Updated 4 years ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆81Updated 8 months ago
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- Library for fast text representation and classification.☆28Updated last year
- ☆56Updated 2 years ago
- Experiments with Hugging Face 🔬 🤗☆44Updated 9 months ago
- Extremely easy to use sequence to sequence library with attention, for text to text conversion tasks.☆39Updated 4 years ago
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.☆107Updated last week
- Text utilities, including beam search decoding, tokenizing, and more, built for use in Flashlight.☆70Updated 3 months ago
- Python Finite-State Toolkit☆55Updated this week
- Question Generation - Question Answering for Automatic Flashcards☆64Updated 3 years ago
- negate_sentence(A Python module that doesn't negate sentences.)☆31Updated 7 months ago
- A python true casing utility that restores case information for texts☆88Updated 2 years ago
- Resources for GLUE benchmark in Spanish☆15Updated 4 years ago
- A guide to building language technology in new languages.☆58Updated 3 years ago
- Fast Neural Machine Translation in C++ - development repository☆19Updated last year
- ☆76Updated 3 years ago
- German small and large versions of GPT2.☆20Updated 3 years ago
- Efficiently computing & storing token n-grams from large corpora☆23Updated 8 months ago
- LTG-Bert☆33Updated last year
- Test prompts for GPT-J-6B and the resulting AI-generated texts☆53Updated 3 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆25Updated 6 months ago