remusao / wgraph
Etymological graphs based on Wiktionary dumps
☆18Updated last year
Related projects ⓘ
Alternatives and complementary repositories for wgraph
- This repository contains code behind the visualization of the Wikimedia tool etytree at http://tools.wmflabs.org/etytree/☆50Updated 5 years ago
- [LREC 2020] EtymDB, an Etymological DataBase (v2.1)☆21Updated 2 years ago
- English Lemma Database - Compiled by Referencing British National Corpus☆29Updated last month
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆94Updated this week
- An LL parser for extracting information from Wiki text, particularly Wiktionary.☆48Updated last year
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆52Updated 3 years ago
- Helsinki Finite-State Technology (library and application suite)☆123Updated this week
- German Morphological Analyzer☆47Updated 3 years ago
- Real world example to demonstrate advanced techniques to unmarshall very large xml document with very low memory footprint.☆58Updated last year
- Interactive visualization of Wiktionary words and etymologies.☆90Updated this week
- An open etymology dataset created using Wiktionary data. Contains 3.8M entries, 1.8M terms, 2900 languages, and 31 unique relationship ty…☆79Updated 6 months ago
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆43Updated last year
- Gather modern English word frequencies from all enwiki articles.☆204Updated 8 months ago
- JavaScript Lemmatizer is a lemmatization library to retrieve a base form from an English inflected word.☆65Updated 3 years ago
- Wiktionary parser tool for many language editions.☆53Updated 2 years ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated last year
- Offline etymological dictionary based on Wiktionary data☆20Updated 2 years ago
- A Corpus Data Retrieval Index using Lucene for Look-Ups☆16Updated this week
- A character-wise tokenizer for morphologically rich languages☆27Updated 5 months ago
- Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code☆50Updated last year
- A list of vocabulary lists☆21Updated 4 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆144Updated this week
- A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format☆33Updated 5 years ago
- The 134,000+ words and their pronunciations in the CMU pronouncing dictionary☆67Updated 3 years ago
- A cloud-based, open-source system for writing and publishing dictionaries.☆86Updated 10 months ago
- A multilingual parallel corpus created from translations of the Bible.☆176Updated 2 months ago
- An advanced, extensible web front-end for the Manatee-open corpus search engine☆61Updated this week
- A list of words from the SUBTLEX movie subtitles corpus, sorted by frequency.☆32Updated 4 years ago
- Sentence aligner☆108Updated 3 years ago
- The curation repository for the data behind Concepticon.☆34Updated this week