CyberZHG/wiki-dump-reader

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CyberZHG/wiki-dump-reader)

CyberZHG / wiki-dump-reader

Extract corpora from Wikipedia dumps

☆26

Alternatives and similar repositories for wiki-dump-reader

Users that are interested in wiki-dump-reader are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cambridgeltl / adversarial-postspec
View on GitHub
Auxiliary GAN for WE post-specialisation
☆24Feb 22, 2019Updated 7 years ago
dialogue-evaluation / GramEval2020
View on GitHub
☆24Apr 15, 2021Updated 5 years ago
zeeeyang / two-local-neural-conparsers
View on GitHub
Span and Rule Models for Neural Constituent Parsing
☆10Jun 11, 2018Updated 8 years ago
AxelSorensenDev / Eevee
View on GitHub
An Easy Annotation Tool for Natural Language Processing
☆12May 17, 2024Updated 2 years ago
GFNOrg / multi-objective-gfn
View on GitHub
Code for "Multi-Objective GFlowNets"
☆20Jul 12, 2023Updated 3 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
riiswa / covid19-attestation-deplacement-form
View on GitHub
Formulaire en ligne qui génère une attestation de déplacement dérogatoire
☆10Mar 18, 2020Updated 6 years ago
jhu-ep-coursera / book-searchy
View on GitHub
☆12Jan 3, 2023Updated 3 years ago
Riccorl / sense-embedding
View on GitHub
BabelNet (and WordNet) sense embedding trained with Word2Vec and FastText
☆10Sep 3, 2019Updated 6 years ago
genekogan / wifi_geolocation
View on GitHub
Get your latitude/longitude via wifi access points
☆16Sep 25, 2012Updated 13 years ago
sinaahmadi / wergor
View on GitHub
Rule-based Kurdish Transliterator
☆11May 3, 2024Updated 2 years ago
Shuailong / SentenceClassifier
View on GitHub
PyTorch Sentence Classifier (CNN RNN)
☆11May 17, 2018Updated 8 years ago
0xdolan / mymemory
View on GitHub
MyMemory Dictionary (https://mymemory.translated.net)
☆10Nov 20, 2021Updated 4 years ago
iLanguage / iLanguage
View on GitHub
A semi-unsupervised language independent morphological analyzer useful for stemming unknown language text, or getting a rough estimate of…
☆22Nov 28, 2017Updated 8 years ago
kocmitom / LanideNN
View on GitHub
☆18Jan 21, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
naiveHobo / HoboNet
View on GitHub
Convolution Neural Network for classification of semantic relations in a sentence
☆17Aug 24, 2017Updated 8 years ago
bshillingford / ctc-beam-search
View on GitHub
CTC beam search
☆12Oct 26, 2016Updated 9 years ago
NLP-Core-Team / mmlu_ru
View on GitHub
MMLU eval for RU/EN
☆16Jul 31, 2023Updated 2 years ago
sigmorphon / conll2018
View on GitHub
☆15Aug 14, 2018Updated 7 years ago
acoli-repo / acoli-dicts
View on GitHub
3000+ machine-readable open source dictionaries distributed by the Applied Computational Linguistics lab at the University of Augsburg, G…
☆17Jul 19, 2023Updated 3 years ago
unimorph / wiktionary-tools
View on GitHub
Tools for scraping, annotating, and parsing morphological information from Wiktionary
☆15Oct 19, 2019Updated 6 years ago
haskiindahouse / realtime-pulse-and-respiratory-rate-detection
View on GitHub
Second coursework & case from Sber Data Science competition. Links to scientific papers to which I will refer will be here.
☆13Nov 12, 2021Updated 4 years ago
swiss-ai / parity-aware-bpe
View on GitHub
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization [ACL 2026]
☆19Apr 18, 2026Updated 3 months ago
Shuailong / StockPrediction
View on GitHub
An attempt to use financial news to predict stock market
☆16Nov 17, 2018Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
elnagara / HARD-Arabic-Dataset
View on GitHub
Hotels Arabic-Reviews Dataset
☆34Dec 18, 2018Updated 7 years ago
hellokaton / blog
View on GitHub
blog source code
☆16Nov 7, 2024Updated last year
soumith / the-incredible-pytorch
View on GitHub
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.
☆16Mar 24, 2017Updated 9 years ago
adamhollett / jekyll-and-slide
View on GitHub
Markdown slides in Jekyll with reveal.js
☆15Mar 10, 2022Updated 4 years ago
wanghm92 / Sing_Par
View on GitHub
Forked from tdozat/Parser and adapted to tensorflow 0.12
☆19Mar 21, 2021Updated 5 years ago
SapienzaNLP / xl-amr
View on GitHub
XL-AMR is a sequence-to-graph cross-lingual AMR parser that exploits transfer learning (EMNLP2020).
☆17Jul 25, 2024Updated last year
erfannoury / GoodSearcher
View on GitHub
A pyLucene-based search module for searching books from goodreads.com
☆26Oct 15, 2017Updated 8 years ago
AKSW / DBNQA
View on GitHub
DBpedia Neural Question Answering Dataset
☆19Jun 28, 2020Updated 6 years ago
cisnlp / GlotScript
View on GitHub
[LREC 2024] 🖋 Resource and Tool for Writing System Identification
☆22Mar 29, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
nisavid / pyrdb2rdf
View on GitHub
A Python library for RDB2RDF Direct Mapping and R2RML.
☆22Apr 12, 2024Updated 2 years ago
czcorpus / InterText_editor
View on GitHub
Editor for aligned parallel texts (personal desktop application).
☆20Jan 15, 2026Updated 6 months ago
zhenyuanlu / awesome-pain-intensity-classification-papers
View on GitHub
A comprehensive list of pain intensity classification papers mainly based on deep learning algorithms
☆12Oct 20, 2024Updated last year
enabling-languages / python-i18n
View on GitHub
Random notes on Python internationalisation
☆19Aug 10, 2023Updated 2 years ago
kpu / fasterText
View on GitHub
Library for fast text representation and classification.
☆31Jan 9, 2024Updated 2 years ago
duytinvo / ijcai2015
View on GitHub
Target-dependent Twitter Sentiment Classification with Rich Automatic Features
☆22Jul 20, 2016Updated 10 years ago
nyurik / lexicator
View on GitHub
Imports Wiktionary's grammatical data into Wikidata
☆18Jan 11, 2020Updated 6 years ago