eligugliotta / tarcLinks
Tunisian Arabish Corpus
☆11Updated last year
Alternatives and similar repositories for tarc
Users that are interested in tarc are comparing it to the libraries listed below
Sorting:
- MAFAND-MT☆57Updated last year
- Python intefrace for evaluation on chatgpt models☆19Updated last year
- A collection of preprocessed datasets and pretrained models for generating paraphrases.☆30Updated 4 years ago
- Code repository for "Introducing Airavata: Hindi Instruction-tuned LLM"☆60Updated 9 months ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆74Updated 3 years ago
- aiXplain enables python programmers to add AI functions to their software.☆47Updated last week
- Crosslingual Question Answering for African Languages☆31Updated 10 months ago
- Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to E…☆23Updated 2 years ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆31Updated 5 months ago
- LTG-Bert☆33Updated last year
- This is the official repository for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks.☆26Updated 8 months ago
- Pretraining, fine-tuning and evaluation scripts for IndicBERT-v2 and IndicXTREME☆100Updated 4 months ago
- zero shot NER fine tuning☆13Updated 4 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆65Updated last year
- A tiny BERT for low-resource monolingual models☆31Updated 10 months ago
- Using short models to classify long texts☆21Updated 2 years ago
- TURJUMAN, a neural toolkit for translating from 20 languages into Modern Standard Arabic (MSA).☆56Updated 2 years ago
- ☆17Updated 2 years ago
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆147Updated 2 months ago
- Fine-tuning Open-Source LLMs for Adaptive Machine Translation☆85Updated last month
- Aranizer: A Custom Tokenizer based on SentencePiece and BPE tailored for Arabic Language Modeling☆20Updated last year
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆80Updated last year
- A blueprint for creating Pretraining and Fine-Tuning datasets for Indic languages☆107Updated 10 months ago
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆41Updated 2 years ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆63Updated 2 months ago
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 3 years ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆27Updated last year
- ☆20Updated 3 years ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆59Updated last year
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆69Updated 2 years ago