AndreiRegiani / wikipedia-crawler
Extracts plain-text from Wikipedia articles, ideal to perform linguistic analysis on a specific topic
☆39Updated 2 months ago
Alternatives and similar repositories for wikipedia-crawler:
Users that are interested in wikipedia-crawler are comparing it to the libraries listed below
- A python true casing utility that restores case information for texts☆88Updated 2 years ago
- Stand-alone WordNet API☆48Updated 3 years ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- sequence tagging with spaCy and crfsuite☆19Updated 2 years ago
- Implementation of Hobbs' algorithm for coreference resolution in python☆44Updated 4 years ago
- General-Purpose Neural Networks for Sentence Boundary Detection☆73Updated 2 years ago
- A Java Wikipedia markup to plain text converter☆37Updated 3 years ago
- python wrapper for facebook's duckling☆23Updated 8 months ago
- Python wrapper for aspell (C extension and python version)☆81Updated last year
- Python library for converting UTF to WX and vice-versa for Indian languages.☆47Updated 2 years ago
- 📚 Text classification library with Keras☆52Updated 4 years ago
- Tools for extracting parallel corpora from article titles across languages in Wikipedia☆73Updated 10 years ago
- PyTorch implementation of context2vec from Melamud et al., CoNLL 2016☆19Updated 6 years ago
- High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python. Correct case insensitive implementa…☆94Updated 6 months ago
- Fast, DB Backed pretrained word embeddings for natural language processing.☆222Updated 2 weeks ago
- LM, ULMFit et al.☆46Updated 5 years ago
- Character-level CNN for text classification☆54Updated 3 years ago
- Text Simplification System and Dataset☆123Updated last year
- Python implementation of ROUGE☆31Updated 7 years ago
- A fully customisable language detection pipeline for spaCy☆92Updated 5 years ago
- Code and data for segmentation experiments.☆22Updated 10 years ago
- Multilingual abstractive summarization dataset extracted from WikiHow.☆90Updated last month
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 3 years ago
- ☆15Updated 6 years ago
- A Python implementation of the BM25 ranking function.☆235Updated 5 years ago
- Fast and customizable tokenization☆64Updated 5 years ago
- An extractive neural network text summarization library for the EMNLP 2018 paper "Content Selection in Deep Learning Models of Summarizat…☆107Updated 5 years ago
- SQL-to-Text is a simple code for translating SQL to Text Generation with a novel Graph-to-Sequence Model☆73Updated 6 years ago
- data and scripts for the shared task "Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)" at SemEval 2015☆43Updated 4 years ago
- a pytorch implementation of auto-punctuation learned character by character☆142Updated 4 years ago