google / corpuscrawlerLinks
Crawler for linguistic corpora
☆204Updated last year
Alternatives and similar repositories for corpuscrawler
Users that are interested in corpuscrawler are comparing it to the libraries listed below
Sorting:
- A multilingual parallel corpus created from translations of the Bible.☆182Updated last month
- Bitextor generates translation memories from multilingual websites☆294Updated 8 months ago
- Universal Dependencies online documentation☆287Updated last week
- Various utilities for processing the data.☆210Updated last week
- Lexical data at Unicode☆68Updated 10 months ago
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆197Updated 4 years ago
- 📂 Additional lookup tables and data resources for spaCy☆105Updated last month
- A cloud-based, open-source system for writing and publishing dictionaries.☆93Updated last year
- Collaborative data curation for Glottolog☆168Updated last week
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated 2 years ago
- Sentence aligner☆115Updated 4 years ago
- Open information and community for machine translation☆78Updated 2 weeks ago
- The Unicode Cookbook for Linguists☆54Updated 4 years ago
- Datasets and tools for basic natural language processing.☆383Updated 3 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆38Updated 3 years ago
- spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface☆260Updated 10 months ago
- Translation Memory Open-source Purifier☆34Updated 2 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆158Updated last year
- Language Tool style grammar handling with spaCy 2.0☆42Updated 6 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- A modern, interlingual wordnet interface for Python☆254Updated last week
- Indian Language Tagger and Chunker (Hindi, Telugu, Tamil, Marathi, Punjabi, Kanada, Malayalam, Urdu, Bengali)☆41Updated 2 years ago
- Cython wrapper on Hunspell Dictionary☆66Updated last year
- LingPy: Python library for quantitative tasks in historical linguistics☆136Updated 4 months ago
- A code for transliterating (romanizing) Arabic text using the American Library Association - Library of Congress (ALA-LC) standard☆47Updated 3 years ago
- Transliteration data and models☆56Updated 8 years ago
- Transform TMX to text☆27Updated 2 years ago
- Text tokenization and sentence segmentation (segtok v2)☆205Updated 3 years ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆30Updated 2 weeks ago
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆49Updated 2 years ago