eligugliotta / tarcLinks
Tunisian Arabish Corpus
☆11Updated last year
Alternatives and similar repositories for tarc
Users that are interested in tarc are comparing it to the libraries listed below
Sorting:
- MAFAND-MT☆59Updated last year
- Code repository for "Introducing Airavata: Hindi Instruction-tuned LLM"☆61Updated last year
- TURJUMAN, a neural toolkit for translating from 20 languages into Modern Standard Arabic (MSA).☆57Updated 2 years ago
- ☆128Updated last year
- A collection of preprocessed datasets and pretrained models for generating paraphrases.☆31Updated 4 years ago
- Aranizer: A Custom Tokenizer based on SentencePiece and BPE tailored for Arabic Language Modeling☆21Updated last year
- ☆17Updated 2 years ago
- Crosslingual Question Answering for African Languages☆30Updated last year
- Fine-tuning Open-Source LLMs for Adaptive Machine Translation☆88Updated 4 months ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆34Updated 8 months ago
- aiXplain enables python programmers to add AI functions to their software.☆49Updated last week
- ☆42Updated 2 years ago
- Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.☆336Updated 11 months ago
- This is the official repository for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks.☆26Updated 11 months ago
- Python intefrace for evaluation on chatgpt models☆19Updated last year
- ☆21Updated 3 years ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆63Updated last year
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Updated last year
- مستودع الأوراق المسحية في معالجة اللغة العربية (أسبر) A Repository for survey and review papers in Arabic Natural Language processing (AN…☆84Updated last week
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆78Updated 3 years ago
- This is a diacritization model for Arabic language. This model was built/trained using the Tashkeela: the Arabic diacritization corpus on…☆45Updated 2 years ago
- A tiny BERT for low-resource monolingual models☆31Updated 2 months ago
- Pretraining, fine-tuning and evaluation scripts for IndicBERT-v2 and IndicXTREME☆105Updated 7 months ago
- LLM_library is a comprehensive repository serves as a one-stop resource hands-on code, insightful summaries.☆69Updated last year
- A comprehensive list of Arabic NLP resources.☆42Updated 2 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆106Updated last year
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆41Updated 2 years ago
- Arabic cleaning, normalization and segmentation library.☆72Updated 2 years ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆52Updated last month
- Fine Tuning Multimodal LLM "Idefics 9B" on Pokemon Go Dataset available on Hugging Face.☆18Updated last year