masakhane-io / masakhanePreprocessorLinks
Building an effective preprocessing tool for African languages
☆13Updated last year
Alternatives and similar repositories for masakhanePreprocessor
Users that are interested in masakhanePreprocessor are comparing it to the libraries listed below
Sorting:
- Data, Embeddings, Stopword lists, code, and baselines for COLING 2020 paper titled "KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text …☆13Updated last year
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆35Updated 2 months ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆79Updated 3 years ago
- Streamlit app to Translate text to or between 50 languages with mBART-50 from Huggingface and Facebook☆25Updated 4 years ago
- MasakhaNEWS: News Topic Classification for African Languages☆24Updated last year
- List of all the resources I developed in collaboration with LSV and Masakhane during my doctoral studies and beyond☆12Updated 3 years ago
- Crosslingual Question Answering for African Languages☆30Updated last year
- NeatText a simple NLP package for cleaning textual data and text preprocessing☆74Updated 2 years ago
- A simple library for segmenting legal texts☆17Updated 2 years ago
- COMET for African languages☆10Updated 10 months ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- ☆12Updated last year
- Repository containing awesome resources regarding Hugging Face tooling.☆48Updated last year
- MAFAND-MT☆60Updated last year
- A collection of textual datasets in Hausa language and the corresponding translation in English language.☆16Updated 4 years ago
- ☆12Updated last year
- POS for African languages☆19Updated 5 months ago
- Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).☆76Updated last month
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Updated 3 months ago
- Experimentation on google's gemma model☆16Updated last year
- A dataset for pretraining language models targeted for legal tasks.☆140Updated 3 years ago
- ☆115Updated 2 months ago
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…☆34Updated 3 months ago
- Python intefrace for evaluation on chatgpt models☆19Updated last year
- Synthetic Text Dataset Generation for LLM projects☆53Updated 3 weeks ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated last year
- Using short models to classify long texts☆21Updated 2 years ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆80Updated 2 years ago
- Transforming textual descriptions into process models using deep learning☆15Updated 6 years ago
- Mining Legal Arguments in Court Decisions - Data and software☆73Updated 2 years ago