masakhane-io / masakhanePreprocessorLinks
Building an effective preprocessing tool for African languages
☆13Updated last year
Alternatives and similar repositories for masakhanePreprocessor
Users that are interested in masakhanePreprocessor are comparing it to the libraries listed below
Sorting:
- Data, Embeddings, Stopword lists, code, and baselines for COLING 2020 paper titled "KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text …☆13Updated last year
- Crosslingual Question Answering for African Languages☆31Updated 11 months ago
- MasakhaNEWS: News Topic Classification for African Languages☆24Updated last year
- A collection of textual datasets in Hausa language and the corresponding translation in English language.☆16Updated 4 years ago
- MAFAND-MT☆57Updated last year
- List of all the resources I developed in collaboration with LSV and Masakhane during my doctoral studies and beyond☆12Updated 3 years ago
- GPTNERMED is a language model-generated, synthetic dataset and an open neural NER model for medical entities designed for German data.☆16Updated last year
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆33Updated last year
- Streamlit app to Translate text to or between 50 languages with mBART-50 from Huggingface and Facebook☆25Updated 4 years ago
- Repository containing awesome resources regarding Hugging Face tooling.☆48Updated last year
- A simple library for segmenting legal texts☆17Updated 2 years ago
- POS for African languages☆18Updated 2 months ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆77Updated 3 years ago
- Transforming textual descriptions into process models using deep learning☆15Updated 6 years ago
- COMET for African languages☆10Updated 7 months ago
- ☆17Updated 2 years ago
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English☆220Updated last month
- NuNER is the family of SOTA Foundation and Zero-shot for Entity Recognition☆14Updated last year
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated 2 years ago
- ☆22Updated last year
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆32Updated 4 years ago
- ☆111Updated last year
- A dataset for pretraining language models targeted for legal tasks.☆139Updated 3 years ago
- Benchmarking algorithms for assessing quality of data labeled by multiple annotators☆33Updated 2 years ago
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…☆33Updated last month
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated 2 years ago
- Dataset for the NLPMC @ NAACL 2021 Paper: Assertion Detection in Clinical Notes: Medical Language Models to the Rescue?☆15Updated 3 years ago
- Tool to take your ML model from local to production with one-line of code.☆25Updated last year
- ☆12Updated 11 months ago
- NeatText a simple NLP package for cleaning textual data and text preprocessing☆72Updated last year