newca12 / dictionary-builder
Real world example to demonstrate advanced techniques to unmarshall very large xml document with very low memory footprint.
☆60Updated last week
Alternatives and similar repositories for dictionary-builder:
Users that are interested in dictionary-builder are comparing it to the libraries listed below
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot☆13Updated 4 years ago
- This is a new backend implementation of the ANNIS linguistic search and visualization system.☆17Updated last week
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆63Updated 10 months ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆97Updated 3 weeks ago
- ONIX validation library and commandline tool☆23Updated this week
- A tool for creating pivot tables from the command line.☆14Updated 2 years ago
- ☆45Updated 2 years ago
- Linguistic search for large annotated text corpora, based on Apache Lucene☆111Updated this week
- an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction☆36Updated 3 weeks ago
- German Morphological Analyzer☆47Updated 3 years ago
- Text-Induced Corpus Clean-up☆20Updated last year
- Pure Rust port of CRFsuite: a fast implementation of Conditional Random Fields (CRFs)☆29Updated 3 weeks ago
- Java Wiktionary Library☆57Updated 2 years ago
- Context-sensitive word embeddings with subwords. In Rust.☆87Updated last year
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆154Updated 4 months ago
- Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.☆73Updated last year
- ☆65Updated 2 years ago
- Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project☆49Updated 4 months ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆112Updated 2 months ago
- An LL parser for extracting information from Wiki text, particularly Wiktionary.☆48Updated last year
- Offline etymological dictionary based on Wiktionary data☆21Updated 3 years ago
- This repository contains code behind the visualization of the Wikimedia tool etytree at http://tools.wmflabs.org/etytree/☆51Updated 5 years ago
- Various utilities regarding Levenshtein transducers.☆68Updated 4 years ago
- An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification☆86Updated last year
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- Spelling correction & Fuzzy search based on Symmetric Delete spelling correction algorithm.☆134Updated last month
- WordNet-LMF formats☆21Updated last month
- A Rust library for reading and writing WARC files☆53Updated 4 months ago
- Python package for harvesting records from OAI-PMH provider(s).☆62Updated 2 years ago