Python-based implementation of the Translate-Align-Retrieve method to automatically translate the SQuAD Dataset to Spanish.
☆59Dec 8, 2022Updated 3 years ago
Alternatives and similar repositories for TranslateAlignRetrieve
Users that are interested in TranslateAlignRetrieve are comparing it to the libraries listed below
Sorting:
- This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".☆80Jun 3, 2021Updated 4 years ago
- Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"☆40Jan 2, 2019Updated 7 years ago
- Code for our Paper, 'Summaformers @ LaySumm 20, LongSumm 20' at EMNLP 2020, Scholarly Document Processing Workshop☆12Feb 10, 2021Updated 5 years ago
- New dataset☆311Aug 31, 2021Updated 4 years ago
- TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and …☆317May 28, 2020Updated 5 years ago
- Highly specialized crate to parse and use `google/sentencepiece` 's precompiled_charsmap in `tokenizers`☆21Jan 8, 2026Updated 2 months ago
- Neural Paraphrase Generation based on OpenNMT-py☆12Jan 2, 2018Updated 8 years ago
- Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"☆43Oct 15, 2022Updated 3 years ago
- Parallel Universal Dependencies.☆15Nov 12, 2025Updated 4 months ago
- ☆33Aug 16, 2021Updated 4 years ago
- Pre-training BART in Flax on The Pile dataset☆22Jul 24, 2021Updated 4 years ago
- Simple Questions Generate Named Entity Recognition Datasets (EMNLP 2022)☆76Apr 10, 2023Updated 2 years ago
- ☆21Nov 20, 2020Updated 5 years ago
- benchmarks for evaluating MT models☆11Jun 26, 2024Updated last year
- ☆207Nov 12, 2021Updated 4 years ago
- A collection of basic text processing modules focused on Gujarati☆10Oct 24, 2017Updated 8 years ago
- A starter kit for evaluating benchmarks on the 🤗 Hub☆16Dec 29, 2023Updated 2 years ago
- End-to-end training of Retrieval-Augmented LMs (REALM, RAG)☆23Nov 22, 2023Updated 2 years ago
- ☆20Apr 5, 2021Updated 4 years ago
- KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a l…☆25Jul 27, 2024Updated last year
- The pipeline for the OSCAR corpus☆176Nov 9, 2025Updated 4 months ago
- ☆11Jul 31, 2025Updated 7 months ago
- Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME)☆22Apr 11, 2020Updated 5 years ago
- A collection of Danish Transformers☆30Aug 27, 2021Updated 4 years ago
- Placeholder repository☆15Mar 16, 2022Updated 4 years ago
- Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP☆10Jun 26, 2021Updated 4 years ago
- Key 3d computer vision math, concepts, and code☆14Dec 10, 2020Updated 5 years ago
- ☆13Sep 2, 2021Updated 4 years ago
- ☆16Nov 6, 2019Updated 6 years ago
- The source for the astropy data repository (although the primary server is not on github)☆13Mar 1, 2026Updated 2 weeks ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Apr 15, 2024Updated last year
- CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data☆19Dec 4, 2021Updated 4 years ago
- ☆10Oct 28, 2019Updated 6 years ago
- various web scrapers as examples☆17Oct 10, 2020Updated 5 years ago
- A Python wrapper for the bioRxiv API.☆10Aug 18, 2021Updated 4 years ago
- Extract intent and entities from natural language utterances☆36Jan 4, 2018Updated 8 years ago
- Small utility to monitor fairseq training in tensorboard☆21Apr 28, 2019Updated 6 years ago
- Data and scripts for the proper evaluation of cross-lingual embeddings in multiple languages☆15Apr 11, 2020Updated 5 years ago
- automatically align transcribed audio and generate a wav2letter training corpus☆36Apr 11, 2023Updated 2 years ago