santhoshtr / sfstLinks
Stuttgart Finite State Transducer system
☆23Updated 5 months ago
Alternatives and similar repositories for sfst
Users that are interested in sfst are comparing it to the libraries listed below
Sorting:
- Tools for working with book data☆18Updated last month
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆66Updated last month
- Crop And Splice Segments (of scanned pages)☆14Updated 6 years ago
- Python based Wikidata framework for easy dataframe extraction☆45Updated 2 years ago
- Lexical data at Unicode☆70Updated last year
- Automatically exported from code.google.com/p/guess-language☆54Updated 3 months ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆57Updated 4 years ago
- This repository provides various Python methods for finding and aggregating synonyms for an individual word or a list of words.☆36Updated 2 years ago
- Library of Congress coding standards☆31Updated last year
- ☆17Updated 2 weeks ago
- Github mirror - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing)☆37Updated last year
- 🌸 Train floret vectors☆18Updated 2 years ago
- Ergonomic line-by-line transcription of scanned text.☆54Updated 5 years ago
- Aksharamukha Python Library☆56Updated 11 months ago
- Ontologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.☆22Updated last month
- Tools for TICCL☆14Updated last month
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Tools for scraping, annotating, and parsing morphological information from Wiktionary☆15Updated 6 years ago
- Link Wikidata items to large catalogs☆96Updated 3 months ago
- Linked SDMX☆17Updated 11 years ago
- A sentence segmentation library with wide language support optimized for speed and utility.☆82Updated last month
- WordNet-LMF formats☆24Updated 2 months ago
- A dataset of popular pages (taken from <dir.yahoo.com>) with manually marked up semantic blocks.☆15Updated 11 years ago
- 🔍 Mirror of https://gerrit.wikimedia.org/g/mediawiki/extensions/CirrusSearch. See https://www.mediawiki.org/wiki/Developer_access for co…☆45Updated this week
- Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group.☆19Updated 2 weeks ago
- search interface for scholarly works☆85Updated last year
- OCRopus model for Gothic print (Fraktur)☆19Updated 5 years ago
- Homebase of the IPTC EXTRA project about rule-based text categorization☆13Updated 8 years ago
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆69Updated 5 years ago
- Flask Interface to Thompson's Motif Index☆18Updated 6 years ago