newca12 / dictionary-builder
Real world example to demonstrate advanced techniques to unmarshall very large xml document with very low memory footprint.
☆59Updated last year
Alternatives and similar repositories for dictionary-builder:
Users that are interested in dictionary-builder are comparing it to the libraries listed below
- A Rust library for reading and writing WARC files☆50Updated 2 months ago
- Pure Rust port of CRFsuite: a fast implementation of Conditional Random Fields (CRFs)☆29Updated 3 months ago
- ☆45Updated 2 years ago
- Links on the web break all the time, robustify them!☆52Updated 4 years ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆97Updated this week
- wabac.js - Web Archive Browsing Augmentation Client☆106Updated last week
- CLI tool for importing entities from Wikidata / Wikibase☆23Updated 2 years ago
- Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot☆13Updated 4 years ago
- Context-sensitive word embeddings with subwords. In Rust.☆87Updated last year
- an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction☆33Updated last week
- Java Wiktionary Library☆57Updated 2 years ago
- Multilingual implementation of RAKE algorithm for Rust☆33Updated this week
- A blazingly fast phonetic reduction/hashing algorithm.☆215Updated 3 years ago
- JavaScript module and CLI tool for working with web archive data using the WACZ format specification.☆13Updated last week
- zim reader in rust☆28Updated 2 years ago
- ☆63Updated last year
- Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.☆73Updated last year
- This is a new backend implementation of the ANNIS linguistic search and visualization system.☆17Updated this week
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆25Updated 6 months ago
- Command line interface to Wikidata Query Service☆55Updated 10 months ago
- Rust crate for entity parsing☆16Updated 2 years ago
- finalfusion embeddings in Rust☆95Updated last year
- Command line tool for digging into WARC files☆38Updated this week
- RDF parsers library☆86Updated last month
- Sort-friendly URI Reordering Transform (SURT) python module☆41Updated 6 months ago
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.☆30Updated last month
- ☆16Updated 3 months ago
- Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project☆46Updated 3 months ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)☆155Updated 4 years ago
- A SQLite extension for quickly generating random numbers, booleans, characters, and blobs☆18Updated last year