masakhane-io / masakhanePreprocessorLinks
Building an effective preprocessing tool for African languages
☆13Updated 2 years ago
Alternatives and similar repositories for masakhanePreprocessor
Users that are interested in masakhanePreprocessor are comparing it to the libraries listed below
Sorting:
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆36Updated 3 months ago
- A simple library for segmenting legal texts☆17Updated 2 years ago
- MasakhaNEWS: News Topic Classification for African Languages☆24Updated last year
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆80Updated 3 years ago
- Transforming textual descriptions into process models using deep learning☆15Updated 6 years ago
- Accurate word segmentation for hashtags and text, powered by Transformers and Beam Search. A scalable alternative to heuristic splitters …☆76Updated last month
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- Data, Embeddings, Stopword lists, code, and baselines for COLING 2020 paper titled "KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text …☆13Updated last year
- ☆12Updated last year
- A curated list of materials on AI guardrails☆45Updated 8 months ago
- 💙 Unstructured Data Connectors for Haystack 2.0☆17Updated 2 years ago
- POS for African languages☆19Updated 7 months ago
- Crosslingual Question Answering for African Languages☆30Updated last year
- Chunk your text using gpt4o-mini more accurately☆44Updated last year
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆81Updated 2 years ago
- OpenNyAI is a mission aimed at developing open source software and datasets to catalyze the creation of AI-powered solutions to improve a…☆91Updated last year
- GPTNERMED is a language model-generated, synthetic dataset and an open neural NER model for medical entities designed for German data.☆16Updated 2 years ago
- MAFAND-MT☆60Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated last year
- A collection of textual datasets in Hausa language and the corresponding translation in English language.☆16Updated 4 years ago
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated 2 years ago
- Mining Legal Arguments in Court Decisions - Data and software☆73Updated 2 years ago
- A personal knowledge base that I can dump information to and help me learn☆25Updated 8 months ago
- Medical domain-focused GPT-2 fine-tuning, optimization, and lightweighting research repository (compared to GPT-4).☆37Updated last year
- Repository containing awesome resources regarding Hugging Face tooling.☆48Updated 2 years ago
- Writing Blog Posts with Generative Feedback Loops!☆50Updated last year
- ☆17Updated 3 years ago
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English☆237Updated 6 months ago
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…☆34Updated 5 months ago
- 🤗 HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)☆17Updated last year