masakhane-io / masakhanePreprocessor
Building an effective preprocessing tool for African languages
☆12Updated last year
Alternatives and similar repositories for masakhanePreprocessor:
Users that are interested in masakhanePreprocessor are comparing it to the libraries listed below
- Data, Embeddings, Stopword lists, code, and baselines for COLING 2020 paper titled "KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text …☆12Updated 11 months ago
- Crosslingual Question Answering for African Languages☆29Updated 6 months ago
- ☆109Updated last year
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆32Updated last year
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆103Updated 11 months ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆72Updated 2 years ago
- AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/☆48Updated last year
- MAFAND-MT☆55Updated 8 months ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- MasakhaNEWS: News Topic Classification for African Languages☆21Updated 10 months ago
- A simple library for segmenting legal texts☆15Updated last year
- ☆22Updated 10 months ago
- ☆54Updated last year
- 🧪 Cutting-edge experimental spaCy components and features☆98Updated 11 months ago
- Open information and community for machine translation☆74Updated last week
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated last year
- Named entity recognition for the legal domain☆43Updated 3 years ago
- ☆17Updated 2 years ago
- A collection of textual datasets in Hausa language and the corresponding translation in English language.☆15Updated 4 years ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆32Updated 3 years ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆31Updated 3 weeks ago
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated last year
- ☆23Updated last year
- All our community docs! Start here! Lets put Africa on the NLP Map☆59Updated 11 months ago
- Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).☆70Updated 7 months ago
- Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.☆118Updated 11 months ago
- Domain-Specific Text Generation for Machine Translation (with LLMs) - scripts and config files for the paper☆15Updated last year
- Almost state of art text generation library☆66Updated 5 months ago
- Mining Legal Arguments in Court Decisions - Data and software☆66Updated last year
- Contents moved to https://github.com/deepset-ai/haystack-home☆32Updated 2 years ago