MiniXC / opensubtitles-dataloader
Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.
☆13Updated 4 years ago
Alternatives and similar repositories for opensubtitles-dataloader:
Users that are interested in opensubtitles-dataloader are comparing it to the libraries listed below
- Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)☆60Updated last year
- German small and large versions of GPT2.☆20Updated 2 years ago
- LTG-Bert☆29Updated last year
- Fast Neural Machine Translation in C++ - development repository☆19Updated 8 months ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆77Updated 4 months ago
- Library for fast text representation and classification.☆28Updated last year
- MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert…☆48Updated 3 years ago
- Grammar Induction using a Template Tree Approach☆43Updated 2 years ago
- ☆30Updated 4 years ago
- ☆38Updated 2 years ago
- Efficiently computing & storing token n-grams from large corpora☆17Updated 3 months ago
- Execute arbitrary SQL queries on 🤗 Datasets☆32Updated last year
- Conversational text Analysis using various NLP techniques☆179Updated last year
- Confection: the sweetest config system for Python☆182Updated 8 months ago
- Extremely easy to use sequence to sequence library with attention, for text to text conversion tasks.☆39Updated 4 years ago
- A file utility for accessing both local and remote files through a unified interface.☆36Updated 2 weeks ago
- Implementation of N-Grammer in Flax☆16Updated 2 years ago
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- A utility for labeling clusters of text data.☆28Updated 3 years ago
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆22Updated 2 months ago
- Loadable spellfix1 extension for sqlite as python package☆25Updated 9 months ago
- URL downloader supporting checkpointing and continuous checksumming.☆19Updated last year
- Podium: a framework agnostic Python NLP library for data loading and preprocessing☆60Updated 2 years ago
- Experiments with Hugging Face 🔬 🤗☆45Updated 5 months ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆30Updated 10 months ago
- NoPdb: Non-interactive Python Debugger☆83Updated 2 years ago
- ☆87Updated 2 years ago
- ✨ Ravestate is Roboy's reactive dialogue state library.☆25Updated 2 years ago
- A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+☆37Updated 3 years ago