newca12 / dictionary-builder
Real world example to demonstrate advanced techniques to unmarshall very large xml document with very low memory footprint.
☆58Updated 11 months ago
Related projects: ⓘ
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆92Updated this week
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆60Updated 4 months ago
- A set of utilities for processing MediaWiki XML dump data.☆44Updated last month
- Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project☆43Updated last month
- An LL parser for extracting information from Wiki text, particularly Wiktionary.☆48Updated last year
- The Global WordNet Association Collaborative Inter-Lingual Index☆40Updated 3 months ago
- Stand-off Text Annotation Model (STAM) is a data model for stand-off-text annotation where any information on a text is represented as an…☆14Updated 3 weeks ago
- Offline bilingual dictionaries made using data from Wiktionary☆52Updated 9 years ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆44Updated 3 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆139Updated last month
- CLI tool for importing entities from Wikidata / Wikibase☆23Updated last year
- The Language Learning Toolkit (LLTK) performs a variety of tasks useful for (human) language learning.☆41Updated 4 years ago
- Sort-friendly URI Reordering Transform (SURT) python module☆39Updated last month
- Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic pr…☆65Updated last week
- Link Wikidata items to large catalogs☆98Updated 6 months ago
- WordNet-LMF formats☆20Updated last week
- Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.☆142Updated 7 months ago
- A part-of-speech tagger with support for domain adaptation and external resources.☆22Updated last year
- Multi Tier Annotation Search☆26Updated 3 years ago
- German Morphological Analyzer☆45Updated 2 years ago
- Offline etymological dictionary based on Wiktionary data☆20Updated 2 years ago
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 6 years ago
- A tool to analyse, browse and query Wikidata☆84Updated 7 months ago
- Coquery is a free corpus query tool for linguists, lexicographers, translators, and anybody who wishes to search and analyse a text corpu…☆18Updated 2 years ago
- Multi Tier Annotation Search☆12Updated 4 months ago
- Fast PDF generation and compression. Deals with millions of pages daily.☆97Updated last month
- Command line interface to Wikidata Query Service☆54Updated 5 months ago
- Loadable spellfix1 extension for sqlite as python package☆25Updated 5 months ago
- Machine-readable Wiktionary☆74Updated 4 months ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆110Updated 2 months ago