rspeer / wikiparsec
An LL parser for extracting information from Wiki text, particularly Wiktionary.
☆48Updated last year
Alternatives and similar repositories for wikiparsec:
Users that are interested in wikiparsec are comparing it to the libraries listed below
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- The curation repository for the data behind Concepticon.☆38Updated last month
- Stanford Tregex-inspired language for rule-based dependency tree manipulation.☆21Updated 8 years ago
- Command-line corpus tools☆9Updated 7 years ago
- This is a new backend implementation of the ANNIS linguistic search and visualization system.☆17Updated 2 weeks ago
- Wikidata property explorer☆17Updated last year
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- Bilingual sentence aligner (Gale & Church, 1993)☆14Updated 6 years ago
- Pandoc filter to use Wikidata as reference manager☆17Updated 4 years ago
- Search back-end for dependency tree search. See the docs at https://fginter.github.io/dep_search/☆17Updated 6 years ago
- The Open Multilingual Wordnet☆61Updated 10 months ago
- TEI Reader Python Library☆17Updated last year
- eXtensible Interlinear Glossed Text☆32Updated 2 years ago
- Basic dataset for the linguistic data collection.☆15Updated 8 years ago
- Supervised learning of morphology☆28Updated 8 years ago
- Frontend for Korp, a tool using the IMS Open Corpus Workbench (CWB).☆16Updated last week
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆63Updated 10 months ago
- Simple CORPORA list crawler☆10Updated 8 years ago
- ☆30Updated 8 years ago
- Text-Induced Corpus Clean-up☆20Updated last year
- Stand-off Text Annotation Model (STAM) is a data model for stand-off-text annotation where any information on a text is represented as an…☆18Updated 4 months ago
- WordNet-LMF formats☆21Updated last month
- A tool for analyzing the word histories of a text.☆34Updated 4 months ago
- Modernized version of Eric Brill's Part Of Speech tagger.☆17Updated last year
- OCRopus model for Gothic print (Fraktur)☆18Updated 5 years ago
- Recipes for training OpenNMT systems☆14Updated 7 years ago
- Manifests of the public domain images uploaded to Flickr Commons, with descriptive information about the books they were taken from.☆75Updated 10 years ago
- Automatically exported from code.google.com/p/hunpos☆12Updated 6 years ago
- PhiloLogic4☆38Updated 3 months ago
- This repository contains code behind the visualization of the Wikimedia tool etytree at http://tools.wmflabs.org/etytree/☆51Updated 5 years ago