langtech-bsc/Wikiextractor-V2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/langtech-bsc/Wikiextractor-V2)

langtech-bsc / Wikiextractor-V2

Enhaced version of Wikiextrator: A wikipedia dumps extractor

☆30

Alternatives and similar repositories for Wikiextractor-V2

Users that are interested in Wikiextractor-V2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

iPieter / llmq
View on GitHub
A Scheduler for Batched LLM Inference
☆19Oct 5, 2025Updated 9 months ago
philschmid / multilingual-serverless-qa-aws-lambda
View on GitHub
☆10Dec 17, 2020Updated 5 years ago
banditburai / daisyft
View on GitHub
DaisyUI cli for FastHTML projects
☆27May 3, 2025Updated last year
informagi / mmead
View on GitHub
MS Marco Entity Annotations Disambiguation
☆14May 19, 2023Updated 3 years ago
maty-bohacek / xgboost-vs-gpt4
View on GitHub
Official Implementation of the 'When XGBoost Outperforms GPT-4 on Text Classification: A Case Study' NAACL-W 2024 paper
☆16Dec 16, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
kensho-technologies / pathpiece
View on GitHub
PathPiece tokenizer
☆14Nov 10, 2024Updated last year
Ybakman / LLM_Uncertainty
View on GitHub
☆12Sep 22, 2024Updated last year
edoost / pert
View on GitHub
Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging
☆10Nov 15, 2021Updated 4 years ago
gentaiscool / miners
View on GitHub
MINERS ⛏️: The semantic retrieval benchmark for evaluating multilingual language models. (EMNLP 2024 Findings)
☆14Oct 3, 2024Updated last year
ariG23498 / timm-wrapper-examples
View on GitHub
Notebooks to demonstrate TimmWrapper
☆17Jan 16, 2025Updated last year
cisnlp / GlotWeb
View on GitHub
[WWW 2026] 🕸 GlotWeb: Web Indexing for Minority Languages
☆17Apr 14, 2026Updated 3 months ago
mrvoh / meta_learning_multilingual_doc_classification
View on GitHub
Placeholder repository
☆15Mar 16, 2022Updated 4 years ago
ForBo7 / fastai-close-reading
View on GitHub
Structured close reading (or rather, close watching) transcripts of _almost _ every lesson in Jeremy Howard's Practical Deep Learning for…
☆34Mar 9, 2026Updated 4 months ago
cisnlp / ofa
View on GitHub
[NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
☆18Nov 26, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
lianakoleva / no-libtorch-compile
View on GitHub
☆21Mar 3, 2025Updated last year
EleutherAI / best-download
View on GitHub
URL downloader supporting checkpointing and continuous checksumming.
☆19Nov 29, 2023Updated 2 years ago
aus-covid-modelling / NationalCabinetModelling
View on GitHub
☆18Jul 28, 2023Updated 2 years ago
ytabatabaee / Deep-Learning-Material
View on GitHub
Notebooks for Deep Learning course (CE719) TA sessions - Sharif University of Technology
☆14Jul 10, 2021Updated 5 years ago
huggingface / olm-training
View on GitHub
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆98Feb 9, 2023Updated 3 years ago
sayakpaul / count-tokens-hf-datasets
View on GitHub
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Data…
☆27Oct 20, 2022Updated 3 years ago
data2ml / all-clip
View on GitHub
Load any clip model with a standardized interface
☆22Oct 20, 2025Updated 9 months ago
MAEHCM / AET
View on GitHub
Code for AAAI 2023 Paper : “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models”
☆18Dec 6, 2022Updated 3 years ago
AnswerDotAI / uvws
View on GitHub
A simple uv workspace
☆19Apr 5, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
JustlyAI / lmss_entity_extractor
View on GitHub
Tool to apply Legal Matter Specification Standard (LMSS) to documents
☆12Aug 15, 2024Updated last year
VITA-Group / TAPE
View on GitHub
[ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…
☆15Jun 6, 2025Updated last year
davanstrien / hub-semantic-search-mcp
View on GitHub
☆20Jun 9, 2025Updated last year
cindyxinyiwang / multiview-subword-regularization
View on GitHub
PyTorch implementation of NAACL 2021 paper "Multi-view Subword Regularization"
☆26Jun 2, 2021Updated 5 years ago
thombashi / typepy
View on GitHub
A Python library for variable type checker/validator/converter at a run time.
☆17Updated this week
nateraw / spaces-docker-templates
View on GitHub
🚀🤗 A collection of templates for Hugging Face Spaces
☆35Oct 9, 2023Updated 2 years ago
othr-nlp / rage_toolkit
View on GitHub
☆11Sep 27, 2024Updated last year
bltlab / paranames
View on GitHub
ParaNames: A multilingual resource for parallel names
☆40May 20, 2024Updated 2 years ago
LIONS-EPFL / LION
View on GitHub
Linear Attention for Efficient Bidirectional Sequence Modeling
☆18May 13, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
NathanGodey / headless-lm
View on GitHub
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…
☆29Apr 17, 2024Updated 2 years ago
AnswerDotAI / dialoghelper
View on GitHub
Helper functions for solveit dialogs
☆26Updated this week
dpasse / extr
View on GitHub
Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions
☆10Jun 2, 2023Updated 3 years ago
johnrobinsn / alpaca_lora_30b_4bit
View on GitHub
☆18Apr 3, 2023Updated 3 years ago
NP-NET-research / wdel
View on GitHub
WDEL是一个基于Wikidata知识库的实体链接系统。
☆11Feb 12, 2025Updated last year
IINemo / llm-uncertainty-head
View on GitHub
☆26Feb 23, 2026Updated 5 months ago
Yuanhy1997 / HyPe
View on GitHub
HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]
☆14Jul 11, 2023Updated 3 years ago