newca12 / dictionary-builderLinks
Real world example to demonstrate advanced techniques to unmarshall very large xml document with very low memory footprint.
☆61Updated 8 months ago
Alternatives and similar repositories for dictionary-builder
Users that are interested in dictionary-builder are comparing it to the libraries listed below
Sorting:
- Pure Rust port of CRFsuite: a fast implementation of Conditional Random Fields (CRFs)☆29Updated last week
- ☆51Updated 3 years ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆108Updated 2 weeks ago
- Rust crate for entity parsing☆17Updated 2 years ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆56Updated 4 years ago
- Java Wiktionary Library☆58Updated 3 years ago
- Spelling correction & Fuzzy search based on Symmetric Delete spelling correction algorithm.☆140Updated 5 months ago
- an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction (mirror of https://…☆37Updated 2 months ago
- Helsinki Finite-State Technology (library and application suite)☆136Updated last month
- Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.☆78Updated 2 years ago
- Archived Python/Rust hybrid codebase - see divvun/kbdgen for v3☆26Updated 3 years ago
- Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot☆13Updated 4 years ago
- Context-sensitive word embeddings with subwords. In Rust.☆89Updated 2 years ago
- Port of arc90labs-readability with rust☆132Updated last year
- Multilingual implementation of RAKE algorithm for Rust☆35Updated 9 months ago
- Fast English word segmentation in Rust☆101Updated last month
- XPath, XQuery, and XSLT for Rust☆130Updated 3 months ago
- Simple NLP in Rust with Python bindings☆153Updated 2 years ago
- Machine-readable Wiktionary☆77Updated last year
- finalfusion embeddings in Rust☆104Updated 2 years ago
- This is a new backend implementation of the ANNIS linguistic search and visualization system.☆18Updated 2 months ago
- A blazingly fast phonetic reduction/hashing algorithm.☆218Updated 4 years ago
- ☆67Updated 2 years ago
- A Rust library for reading and writing WARC files☆56Updated last year
- Rust wrapper for libxml2☆87Updated last week
- fastText Rust binding☆63Updated last year
- PyTorch models for the ocrs OCR engine☆74Updated last year
- A character encoding detector for legacy Web content.☆108Updated 4 months ago
- A rust implementation of some popular snowball stemming algorithms☆129Updated last year
- English Lemma Database - Compiled by Referencing British National Corpus☆33Updated last year