newca12 / dictionary-builderLinks
Real world example to demonstrate advanced techniques to unmarshall very large xml document with very low memory footprint.
☆60Updated 4 months ago
Alternatives and similar repositories for dictionary-builder
Users that are interested in dictionary-builder are comparing it to the libraries listed below
Sorting:
- ☆48Updated 2 years ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆105Updated last week
- Helsinki Finite-State Technology (library and application suite)☆133Updated 2 months ago
- Java Wiktionary Library☆57Updated 2 years ago
- The code, training pipeline, and models that power Firefox Translations☆198Updated this week
- Context-sensitive word embeddings with subwords. In Rust.☆87Updated last year
- Offline etymological dictionary based on Wiktionary data☆21Updated 3 years ago
- Rust wrapper for libxml2☆83Updated last week
- A blazingly fast phonetic reduction/hashing algorithm.☆218Updated 3 years ago
- German part-of-speech dictionary☆45Updated last year
- Full-text IPFS-friendly and WASM-compatible Search in Rust☆274Updated 2 months ago
- A Rust library for reading and writing WARC files☆56Updated 8 months ago
- Fast English word segmentation in Rust☆100Updated last month
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆168Updated 2 months ago
- Port of arc90labs-readability with rust☆129Updated last year
- Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.☆78Updated last year
- A persistent datastore backed by RocksDB with fuzzy key lookup using an arbitrary distance function accelerated by the SymSpell algorithm☆14Updated last year
- Spelling correction & Fuzzy search based on Symmetric Delete spelling correction algorithm.☆139Updated last month
- The Unicode Cookbook for Linguists☆56Updated 4 years ago
- Multilingual implementation of RAKE algorithm for Rust☆34Updated 5 months ago
- Pure Rust port of CRFsuite: a fast implementation of Conditional Random Fields (CRFs)☆29Updated 3 months ago
- Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project☆51Updated last week
- Rust crate for entity parsing☆17Updated 2 years ago
- finalfusion embeddings in Rust☆102Updated last year
- fastText Rust binding☆61Updated last year
- A FTS5 extension for signal_tokenizer.☆56Updated 11 months ago
- Rust implementation of Duckling☆79Updated 4 years ago
- ☆66Updated 2 years ago
- A sentence segmentation library with wide language support optimized for speed and utility.☆66Updated last month
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆53Updated 4 years ago