MiniXC / opensubtitles-dataloader
Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.
☆13Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for opensubtitles-dataloader
- MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert…☆48Updated 3 years ago
- spaCy match and replace, maintaining conjugation☆34Updated last year
- Question Generation - Question Answering for Automatic Flashcards☆64Updated 2 years ago
- A file utility for accessing both local and remote files through a unified interface.☆36Updated this week
- Tooling to play around with multilingual machine translation for Indian Languages.☆21Updated 2 years ago
- URL downloader supporting checkpointing and continuous checksumming.☆19Updated 11 months ago
- A python module for word inflections designed for use with spaCy.☆92Updated 4 years ago
- LM Pretraining with PyTorch/TPU☆132Updated 5 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 3 months ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆75Updated 2 months ago
- Extremely easy to use sequence to sequence library with attention, for text to text conversion tasks.☆39Updated 4 years ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆31Updated 7 months ago
- Source code and data for Like a Good Nearest Neighbor☆28Updated 9 months ago
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- ☆21Updated last week
- A corpus of Python programs annotated with contracts☆20Updated 2 years ago
- German small and large versions of GPT2.☆20Updated 2 years ago
- An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets☆31Updated 10 months ago
- ☆16Updated 2 years ago
- Open source library for few shot NLP☆77Updated last year
- Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)☆60Updated last year
- Babysit your preemptible TPUs☆84Updated last year
- Resources for GLUE benchmark in Spanish☆15Updated 3 years ago
- ☆18Updated 2 years ago
- Code for the paper-"Mirostat: A Perplexity-Controlled Neural Text Decoding Algorithm" (https://arxiv.org/abs/2007.14966).☆57Updated 2 years ago
- Confection: the sweetest config system for Python☆178Updated 5 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 6 months ago
- HomebrewNLP in JAX flavour for maintable TPU-Training☆46Updated 10 months ago
- ☆42Updated last year
- Python Finite-State Toolkit☆45Updated 2 weeks ago