oscar-project / goclassyView external linksLinks
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
☆86Apr 21, 2021Updated 4 years ago
Alternatives and similar repositories for goclassy
Users that are interested in goclassy are comparing it to the libraries listed below
Sorting:
- Converter from UD-trees to BART representation☆36Mar 6, 2024Updated last year
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- Tools to download and cleanup Common Crawl data☆1,039Apr 25, 2023Updated 2 years ago
- Highly specialized crate to parse and use `google/sentencepiece` 's precompiled_charsmap in `tokenizers`☆19Jan 8, 2026Updated last month
- ☆18Mar 20, 2022Updated 3 years ago
- NLP library designed for reproducible experimentation management☆294Jul 25, 2024Updated last year
- Python source code for EMNLP 2020 paper "Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT".☆35Mar 16, 2022Updated 3 years ago
- speech engine training projects☆29Apr 19, 2021Updated 4 years ago
- ☆12Dec 8, 2022Updated 3 years ago
- Load, build and explore Patstat using the Google Cloud Platform☆10Jan 19, 2019Updated 7 years ago
- ☆40May 2, 2021Updated 4 years ago
- pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference☆61Dec 8, 2022Updated 3 years ago
- Viewer for the 🤗 datasets library.☆86Jul 30, 2021Updated 4 years ago
- Analytic platform for the HAL research archive (in development)☆13Oct 2, 2020Updated 5 years ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 2 years ago
- Official code for AAAI'20 paper "Merging Weak and Active Supervision for Semantic Parsing"☆11Dec 8, 2022Updated 3 years ago
- ☆11Jan 10, 2020Updated 6 years ago
- Build a TensorFlow Lite based computer vision emoji input device with OpenMV 📷 → ✋ 👎 👍 👊☆11Nov 28, 2022Updated 3 years ago
- Poetry Corpora Annotated on Aesthetic Emotions☆12Aug 2, 2022Updated 3 years ago
- Plan and train German transformer models.☆23Feb 22, 2021Updated 4 years ago
- Frame-Semantic and PropBank Semantic Role Labeling with Syntactic Scaffolding.☆50Jun 27, 2021Updated 4 years ago
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated last year
- ☆28Nov 28, 2021Updated 4 years ago
- ☆99Jul 7, 2020Updated 5 years ago
- ☆10Jul 15, 2024Updated last year
- A Docker Wrapper to make the machine easily learn any language on top of INRIA OSCAR dataset using GPT2☆12Jan 30, 2020Updated 6 years ago
- Python package to compute metrics on an NLU intent parsing pipeline☆13Mar 10, 2020Updated 5 years ago
- Specification of a stand-off element for the TEI guidelines☆12Apr 29, 2021Updated 4 years ago
- ☆12Nov 15, 2016Updated 9 years ago
- ☆13Mar 27, 2020Updated 5 years ago
- A Translation Task using TurboTransformers☆11Dec 17, 2020Updated 5 years ago
- A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension☆14Feb 11, 2023Updated 3 years ago
- A python library to generate highly realistic typos (fuzz-testing)☆13Mar 16, 2025Updated 11 months ago
- ☆44Jun 17, 2015Updated 10 years ago
- Implementation of Nested Named Entity Recognition using Flair☆24Oct 29, 2021Updated 4 years ago
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆76Sep 13, 2023Updated 2 years ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.☆2,924Feb 14, 2023Updated 3 years ago
- Inter-annotator agreement for Doccano☆28May 3, 2020Updated 5 years ago
- ☆45Nov 3, 2019Updated 6 years ago