newca12 / dictionary-builder
Real world example to demonstrate advanced techniques to unmarshall very large xml document with very low memory footprint.
☆60Updated last month
Alternatives and similar repositories for dictionary-builder:
Users that are interested in dictionary-builder are comparing it to the libraries listed below
- Pure Rust port of CRFsuite: a fast implementation of Conditional Random Fields (CRFs)☆29Updated last week
- ☆47Updated 2 years ago
- Multilingual implementation of RAKE algorithm for Rust☆33Updated 2 months ago
- Context-sensitive word embeddings with subwords. In Rust.☆87Updated last year
- an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction☆37Updated 2 months ago
- A Rust library for reading and writing WARC files☆53Updated 5 months ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆100Updated this week
- Scraping Wikipedia for fair use sentences☆54Updated last year
- Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.☆76Updated last year
- Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot☆13Updated 4 years ago
- Helsinki Finite-State Technology (library and application suite)☆129Updated 2 weeks ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆154Updated 5 months ago
- Simple NLP in Rust with Python bindings☆150Updated last year
- Text hyphenation for Rust☆54Updated last year
- ☆65Updated 2 years ago
- Various utilities regarding Levenshtein transducers.☆68Updated 4 years ago
- Rust bindings for the spaCy library.☆22Updated 2 years ago
- An LL parser for extracting information from Wiki text, particularly Wiktionary.☆49Updated last year
- Spelling correction & Fuzzy search based on Symmetric Delete spelling correction algorithm.☆136Updated 3 months ago
- finalfusion embeddings in Rust☆100Updated last year
- Python bindings for Rust's fst crate☆51Updated 5 years ago
- Rust implementation of Duckling☆78Updated 3 years ago
- A blazingly fast phonetic reduction/hashing algorithm.☆215Updated 3 years ago
- A persistent datastore backed by RocksDB with fuzzy key lookup using an arbitrary distance function accelerated by the SymSpell algorithm☆14Updated last year
- Java Wiktionary Library☆57Updated 2 years ago
- Lexical data at Unicode☆68Updated 8 months ago
- A CRF based Chinese Named-entity Recognition Library written in Rust☆14Updated 4 years ago
- PDF command-line utils written in Rust☆39Updated last month
- German part-of-speech dictionary☆45Updated last year