frankier / wikiparse
Scrapes some Finnish word definitions from English Wiktionary.
☆7Updated last year
Related projects: ⓘ
- Frontend for Korp, a tool using the IMS Open Corpus Workbench (CWB).☆16Updated this week
- Tools for TICCL☆14Updated this week
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆60Updated 4 months ago
- Stand-off Text Annotation Model (STAM) is a data model for stand-off-text annotation where any information on a text is represented as an…☆14Updated 3 weeks ago
- Recipes for training OpenNMT systems☆14Updated 7 years ago
- Parser for KAF NAF files written in Python☆15Updated 3 years ago
- A powerful, tagset-independent and theory-neutral meta model and API for storing, manipulating, and representing nearly all types of ling…☆15Updated last year
- Ontologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.☆20Updated last year
- ☆21Updated this week
- DBpedia, which frequently crawls and analyses over 120 Wikipedia language editions has near complete information about (1) which facts ar…☆10Updated last year
- This is a new backend implementation of the ANNIS linguistic search and visualization system.☆17Updated 2 weeks ago
- Wiktionary parser tool for many language editions.☆53Updated 2 years ago
- Command-line corpus tools☆9Updated 7 years ago
- WordNet-LMF formats☆20Updated last week
- Wikidata authority file mapping tool☆11Updated 6 years ago
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 6 years ago
- Stanford Tregex-inspired language for rule-based dependency tree manipulation.☆21Updated 7 years ago
- Basic dataset for the linguistic data collection.☆15Updated 7 years ago
- Tools for scraping, annotating, and parsing morphological information from Wiktionary☆13Updated 4 years ago
- Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissions☆19Updated last year
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- A workflow system for Natural Language Processing.☆21Updated 4 years ago
- Use spaCy for NLP and output to the FoLiA XML format.☆12Updated 6 months ago
- The Mueller Report Corpus V 0.1☆11Updated 4 years ago
- A web-based, token-level annotation tool for non-standard language data☆10Updated 3 years ago
- Pikes is a Knowledge Extraction Suite☆23Updated 10 months ago
- python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. Wi…☆18Updated 4 years ago
- Character Vomiting☆10Updated 6 years ago
- The curation repository for the data behind Concepticon.☆32Updated this week
- Source for lemon-model.net☆11Updated 2 years ago