newca12 / dictionary-builderLinks
Real world example to demonstrate advanced techniques to unmarshall very large xml document with very low memory footprint.
☆60Updated 2 months ago
Alternatives and similar repositories for dictionary-builder
Users that are interested in dictionary-builder are comparing it to the libraries listed below
Sorting:
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆101Updated 2 weeks ago
- Command line interface to Wikidata Query Service☆55Updated last year
- A FTS5 extension for signal_tokenizer.☆54Updated 9 months ago
- Java Wiktionary Library☆57Updated 2 years ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project☆49Updated 7 months ago
- A Rust library for reading and writing WARC files☆52Updated 6 months ago
- Pure Rust port of CRFsuite: a fast implementation of Conditional Random Fields (CRFs)☆29Updated last month
- A database of languages and their Wikidata id, Wikimedia language code, ISO 639-1, ISO 639-2, ISO 639-3, ISO 639-6 codes☆16Updated last month
- German part-of-speech dictionary☆45Updated last year
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆160Updated 3 weeks ago
- An LL parser for extracting information from Wiki text, particularly Wiktionary.☆49Updated last year
- Links on the web break all the time, robustify them!☆54Updated 4 years ago
- ☆47Updated 2 years ago
- Context-sensitive word embeddings with subwords. In Rust.☆87Updated last year
- command-line tool to extract taxonomies from Wikidata☆126Updated 5 years ago
- ☆66Updated 2 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆63Updated last year
- A set of utilities for processing MediaWiki XML dump data.☆53Updated 3 months ago
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆72Updated last year
- Web hub based on Wikidata☆37Updated 2 years ago
- Python package for harvesting records from OAI-PMH provider(s).☆63Updated 2 years ago
- Filter and format a newline-delimited JSON stream of Wikibase entities☆97Updated 7 months ago
- ONIX validation library and commandline tool☆24Updated 2 months ago
- Adds a reconciliation API endpoint to Datasette, based on the Reconciliation Service API specification.☆24Updated last year
- produce a stream of citiation data coming off wikimedia☆12Updated 8 years ago
- A library for fetching and reading Tatoeba's weekly exports☆23Updated last year
- A blazingly fast phonetic reduction/hashing algorithm.☆215Updated 3 years ago
- Spelling correction & Fuzzy search based on Symmetric Delete spelling correction algorithm.☆136Updated 4 months ago
- This is a new backend implementation of the ANNIS linguistic search and visualization system.☆17Updated last week