frankier / wikiparse
Scrapes some Finnish word definitions from English Wiktionary.
☆8Updated last year
Alternatives and similar repositories for wikiparse:
Users that are interested in wikiparse are comparing it to the libraries listed below
- A powerful, tagset-independent and theory-neutral meta model and API for storing, manipulating, and representing nearly all types of ling…☆15Updated 2 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆63Updated 11 months ago
- A web-based, token-level annotation tool for non-standard language data☆10Updated 4 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- Wikipedia API wrapper for humans and elk. (en.wikipedia.org/w/api.php, get it?)☆36Updated 10 years ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- This is a new backend implementation of the ANNIS linguistic search and visualization system.☆17Updated 2 weeks ago
- Stanford Tregex-inspired language for rule-based dependency tree manipulation.☆21Updated 8 years ago
- DBpedia, which frequently crawls and analyses over 120 Wikipedia language editions has near complete information about (1) which facts ar…☆11Updated 2 years ago
- Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser…☆49Updated last month
- ☆14Updated 3 years ago
- Recipes for training OpenNMT systems☆14Updated 7 years ago
- eXtensible Interlinear Glossed Text☆33Updated 2 years ago
- Json Wikipedia, contains code to convert the Wikipedia xml dump into a json dump. Questions? https://gitter.im/idio-opensource/Lobby☆17Updated 2 years ago
- bilingual dictionary extractor from parallel corpora☆22Updated 10 years ago
- Tools for TICCL☆14Updated 4 months ago
- Parser for KAF NAF files written in Python☆16Updated 3 years ago
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 6 years ago
- OCRopus model for Gothic print (Fraktur)☆18Updated 5 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆112Updated 3 months ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆12Updated last year
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 4 years ago
- Pikes is a Knowledge Extraction Suite☆23Updated last year
- WordNet-LMF formats☆21Updated 2 months ago
- Tools for scraping, annotating, and parsing morphological information from Wiktionary☆13Updated 5 years ago
- A simple configurable tool for manipulating dependency trees.☆13Updated 4 months ago
- A highly extensible plattform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used st…☆24Updated 4 months ago
- Lexical data at Unicode☆68Updated 8 months ago