google / corpuscrawlerLinks
Crawler for linguistic corpora
☆204Updated last year
Alternatives and similar repositories for corpuscrawler
Users that are interested in corpuscrawler are comparing it to the libraries listed below
Sorting:
- Various utilities for processing the data.☆209Updated this week
- Universal Dependencies online documentation☆285Updated this week
- Collaborative data curation for Glottolog☆165Updated last week
- Lexical data at Unicode☆68Updated 9 months ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated 2 years ago
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆194Updated 4 years ago
- Bitextor generates translation memories from multilingual websites☆293Updated 7 months ago
- Cython wrapper on Hunspell Dictionary☆66Updated 11 months ago
- TED parallel Corpora is growing collection of Bilingual parallel corpora, Multilingual parallel corpora and Monolingual corpora extracted…☆249Updated 9 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆38Updated 3 years ago
- The Global WordNet Association Collaborative Inter-Lingual Index☆43Updated 7 months ago
- Python Finite-State Toolkit☆56Updated last week
- Sentence aligner☆114Updated 4 years ago
- English data☆208Updated last week
- Efficient Low-Memory Aligner☆145Updated 5 months ago
- Language Tool style grammar handling with spaCy 2.0☆42Updated 6 years ago
- Machine-Translation-based sentence alignment tool for parallel text☆309Updated 4 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆157Updated last year
- Transliteration data and models☆56Updated 8 years ago
- LingPy: Python library for quantitative tasks in historical linguistics☆134Updated 3 months ago
- Python framework for processing Universal Dependencies data☆57Updated this week
- Translation Memory Open-source Purifier☆34Updated 2 years ago
- ConllEditor is a tool to edit dependency syntax trees in CoNLL-U format.☆56Updated 3 weeks ago
- A character-wise tokenizer for morphologically rich languages☆27Updated 3 months ago
- Democratizing NLP!☆105Updated last year
- List of research and engineering of NLP for American Native/Indigenous Languages.☆92Updated 4 years ago
- The Unicode Cookbook for Linguists☆54Updated 4 years ago
- A multilingual parallel corpus created from translations of the Bible.☆181Updated last month
- The World Atlas of Language Structures☆61Updated 8 months ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago