openlanguageprofiles / olp-en-cefrj
Open Language Profiles — English profile datasets from CEFR-J
☆116Updated 4 years ago
Alternatives and similar repositories for olp-en-cefrj:
Users that are interested in olp-en-cefrj are comparing it to the libraries listed below
- A corpus of short answers written by learners of English and graded with CEFR levels☆10Updated 3 years ago
- Repository for CEFR-SP corpus and sentence level assessment☆35Updated 5 months ago
- Analyzes the given text and determine what's the vocabulary level based on CEFR levels☆44Updated 2 years ago
- Gather modern English word frequencies from all enwiki articles.☆210Updated 11 months ago
- Unidic packaged for installation via pip.☆85Updated last year
- Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code☆60Updated last year
- Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code☆27Updated last week
- NLP system for predicting the reading difficulty level of a text in terms of its CEFR level.☆47Updated 2 months ago
- BERT-based GEC tagging for Japanese☆16Updated last year
- 🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.☆240Updated 9 months ago
- Multilingual sentence alignment using sentence embeddings☆108Updated 3 months ago
- Converts English text to IPA notation☆376Updated last year
- Tokenizer POS-tagger Lemmatizer and Dependency-parser for modern and contemporary Japanese with BERT models☆17Updated 7 months ago
- Tokenizer POS-Tagger and Dependency-parser with BERT/RoBERTa/DeBERTa/GPT models for Japanese and other languages☆51Updated last month
- MFTE (Multi Feature Tagger of English) Python is the Python version based on Le Foll's MFTE written in Perl. It is extended to include se…☆22Updated 3 weeks ago
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆29Updated 2 months ago
- ☆11Updated 11 months ago
- NLP to classify a text's lexile level☆32Updated 2 months ago
- JavaScript Lemmatizer is a lemmatization library to retrieve a base form from an English inflected word.☆66Updated 3 years ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆40Updated last year
- The official repository for the The Project Dialogism Novel Corpus, a dataset of annotated quotations in full-length English novels.☆39Updated last year
- A modern, interlingual wordnet interface for Python☆232Updated 2 weeks ago
- Sentence aligner☆109Updated 3 years ago
- 🈵 Collected resources to learn/study Manchu (Manchurian Language). 满语滿族満州語入門。☆12Updated last year
- Massively multilingual pronunciation mining☆331Updated 3 months ago
- Improved Sentence Alignment in Linear Time and Space☆165Updated last year
- [LREC 2020] EtymDB, an Etymological DataBase (v2.1)☆24Updated 3 years ago
- A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.☆426Updated last month
- The source of the phonetic transcriptions is Oxford Advanced Learner's Dictionary (3rd ed.), available from the Oxford Text Archive (http…☆23Updated 7 years ago
- Creates interlinearized versions of books (EPUB, MOBI, etc), adding "subtitles" with translations under each word in the text.☆23Updated 4 years ago