oscar-project/goclassy

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/oscar-project/goclassy)

oscar-project / goclassy

An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.

☆86

Alternatives and similar repositories for goclassy

Users that are interested in goclassy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

oscar-project / ungoliant
View on GitHub
The pipeline for the OSCAR corpus
☆178Nov 9, 2025Updated 8 months ago
allenai / pybart
View on GitHub
Converter from UD-trees to BART representation
☆35Mar 6, 2024Updated 2 years ago
cverluise / openPatstat
View on GitHub
Load, build and explore Patstat using the Google Cloud Platform
☆10Jan 19, 2019Updated 7 years ago
stefan-it / ukrainian-electra
View on GitHub
Ukrainian ELECTRA model
☆12Mar 11, 2023Updated 3 years ago
stefan-it / german-gpt2
View on GitHub
German GPT-2 model
☆32Aug 17, 2021Updated 4 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
facebookresearch / cc_net
View on GitHub
Tools to download and cleanup Common Crawl data
☆1,046Apr 25, 2023Updated 3 years ago
julien-c / trainer-proposal
View on GitHub
☆13Mar 27, 2020Updated 6 years ago
istex-archives / istex-browser-extension
View on GitHub
Bouton ISTEX : extension web capable d'insérer dynamiquement sur la page web consultée un lien vers le fulltext d'un document si ce dern…
☆11May 30, 2023Updated 3 years ago
zdou0830 / MetaNLP
View on GitHub
☆11Jan 10, 2020Updated 6 years ago
ltgoslo / simple_elmo_training
View on GitHub
Minimal code to train ELMo models in recent versions of TensorFlow
☆14Jun 16, 2026Updated last month
German-NLP-Group / german-transformer-training
View on GitHub
Plan and train German transformer models.
☆23Feb 22, 2021Updated 5 years ago
GeorgeVern / smala
View on GitHub
Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".
☆13Sep 17, 2021Updated 4 years ago
softcite / softcite_kb
View on GitHub
A Knowledge Base for research software relying on large-scale text mining and curated knowledge sources
☆18May 14, 2023Updated 3 years ago
harvardnlp / strux
View on GitHub
☆18Mar 20, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
cindyxinyiwang / emea
View on GitHub
☆13Dec 11, 2021Updated 4 years ago
facebookresearch / SentAugment
View on GitHub
SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…
☆359Feb 22, 2022Updated 4 years ago
vidurj / parser-adaptation
View on GitHub
☆12Dec 8, 2022Updated 3 years ago
thespectrewithin / joint_align
View on GitHub
Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework
☆52Feb 1, 2020Updated 6 years ago
vwoloszyn / diaa
View on GitHub
Inter-annotator agreement for Doccano
☆28May 3, 2020Updated 6 years ago
harvardnlp / hmm-lm
View on GitHub
☆40May 2, 2021Updated 5 years ago
nikitakit / sabertooth
View on GitHub
Standalone pre-training recipe with JAX+Flax
☆35Apr 3, 2023Updated 3 years ago
alexandra-chron / relm_unmt
View on GitHub
Python source code for EMNLP 2020 paper "Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT".
☆35Mar 16, 2022Updated 4 years ago
miroozyx / BERT_with_keras
View on GitHub
A Keras version of Google's BERT model
☆35Nov 4, 2019Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
violet-zct / fairseq-dro-mnmt
View on GitHub
☆14Sep 10, 2021Updated 4 years ago
feedly / transfer-nlp
View on GitHub
NLP library designed for reproducible experimentation management
☆294Jul 25, 2024Updated last year
commoncrawl / language-detection-cld2
View on GitHub
Natural language detection, Java bindings for CLD2
☆17Feb 26, 2026Updated 4 months ago
tnhaider / poetry-emotion
View on GitHub
Poetry Corpora Annotated on Aesthetic Emotions
☆13Aug 2, 2022Updated 3 years ago
aiintelligentsystems / next-level-bert
View on GitHub
☆16Jun 14, 2024Updated 2 years ago
kyunghyuncho / backprop-kalman-filter
View on GitHub
☆45Nov 3, 2019Updated 6 years ago
Hyperparticle / udify
View on GitHub
A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology…
☆225Dec 20, 2022Updated 3 years ago
yahshibu / nested-ner-tacl2020-flair
View on GitHub
Implementation of Nested Named Entity Recognition using Flair
☆24Oct 29, 2021Updated 4 years ago
iapp-technology / iapp-wiki-qa-dataset
View on GitHub
Open Thai Wikipedia QA Dataset made by iApp Technology
☆14Feb 17, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
MC-BERT / MC-BERT
View on GitHub
☆99Jul 7, 2020Updated 6 years ago
facebookresearch / MLQA
View on GitHub
New dataset
☆311Aug 31, 2021Updated 4 years ago
kpu / preprocess
View on GitHub
Corpus preprocessing
☆100Mar 16, 2024Updated 2 years ago
cahya-wirawan / artificial-commonvoice
View on GitHub
Common Voice Generator using Speech Synthesizer
☆14Jul 28, 2021Updated 4 years ago
swabhs / scaffolding
View on GitHub
Frame-Semantic and PropBank Semantic Role Labeling with Syntactic Scaffolding.
☆50Jun 27, 2021Updated 5 years ago
john-hewitt / dyckkm-learning
View on GitHub
Codebase implementing LMs for learning the Dyck-(k,m) bounded hierarchical language
☆16Oct 11, 2020Updated 5 years ago
browsermt / students
View on GitHub
Efficient teacher-student models and scripts to make them
☆57Dec 16, 2023Updated 2 years ago