boun-tabi / SQuAD-TR
ā9Updated 7 months ago
Alternatives and similar repositories for SQuAD-TR:
Users that are interested in SQuAD-TR are comparing it to the libraries listed below
- Text Classification Dataset for Turkish Languageā10Updated 3 years ago
- Tutorial to pretrain & fine-tune a š¤ Flax T5 model on a TPUv3-8 with GCPā58Updated 2 years ago
- Using short models to classify long textsā21Updated last year
- Repo for Turkish Wiki NER dataset.ā11Updated last year
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/ā¦ā24Updated 9 months ago
- ā22Updated 2 years ago
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Trainingā52Updated 3 weeks ago
- BLOOM+1: Adapting BLOOM model to support a new unseen languageā70Updated 10 months ago
- spaCyTurk - trained models & pipelines for Turkishā18Updated 2 years ago
- ā12Updated 3 months ago
- A tiny BERT for low-resource monolingual modelsā31Updated 3 months ago
- Official implementation of "GPT or BERT: why not both?"ā45Updated 2 months ago
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"ā13Updated 3 years ago
- Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to Eā¦ā20Updated 2 years ago
- Pre-train Static Word Embeddingsā34Updated this week
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.ā93Updated last year
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023ā99Updated 8 months ago
- LTG-Bertā29Updated last year
- ā20Updated 2 years ago
- Minimum Bayes Risk Decoding for Hugging Face Transformersā56Updated 7 months ago
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023ā110Updated last month
- Embedding Recycling for Language modelsā38Updated last year
- ā21Updated last year
- A library of translation-based text similarity measuresā25Updated last year
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"ā29Updated 3 months ago
- GlotCC Dataset and Pipline -- NeurIPS 2024ā17Updated 2 months ago
- triple-encoders is a library for contextualizing distributed Sentence Transformers representations.ā13Updated 4 months ago
- Experiments for XLM-V Transformers Integerationā13Updated last year
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsā24Updated 10 months ago
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learningā29Updated last year