QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning PaLM with only five examples per language. We use the synthetic data to finetune downstream QA models leading to improved accuracy in comparison to English-only and translation-based baselines.
โ34Aug 15, 2023Updated 2 years ago
Alternatives and similar repositories for QAmeleon
Users that are interested in QAmeleon are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ACL 2025] ๐ Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentโ11Apr 6, 2025Updated last year
- Gzip and nearest neighbors for text classificationโ57Aug 1, 2023Updated 2 years ago
- A collection of utilities for handling IPA phones.โ27Sep 24, 2023Updated 2 years ago
- Ukrainian ELECTRA modelโ12Mar 11, 2023Updated 3 years ago
- ๐ข Data Toolkit for Sailor Language Modelsโ96Feb 24, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer โข AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- โ55Apr 18, 2026Updated 2 weeks ago
- From Hero to Zรฉroe: A Benchmark of Low-Level Adversarial Attacksโ15Feb 23, 2023Updated 3 years ago
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 qโฆโ89Feb 27, 2024Updated 2 years ago
- โ10Oct 17, 2021Updated 4 years ago
- โ57Nov 5, 2024Updated last year
- A part-of-speech tagger with support for domain adaptation and external resources.โ24Oct 26, 2022Updated 3 years ago
- โ12Apr 1, 2026Updated last month
- โ37Nov 14, 2025Updated 5 months ago
- Manifests list for a multi-arch Docker imageโ11Jan 23, 2019Updated 7 years ago
- Proton VPN Special Offer - Get 70% off โข AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Non Metric Space ( Approximate ) Library in Rโ12Feb 2, 2023Updated 3 years ago
- Deep memory and sequence models in JAXโ28Apr 23, 2026Updated last week
- Machine learning for molecular dynamicsโ13Jan 9, 2025Updated last year
- Inference and deployment toolkit for Svara-TTS, an open-source multilingual text-to-speech model for Indic languagesโ21Apr 1, 2026Updated last month
- ๅบไบไธญๅฟๅบฆ็ไธญๆๅ ณ้ฎ็ญ่ฏญๆฝๅๅทฅๅ ทโ11Sep 2, 2022Updated 3 years ago
- CDF FAQโ11Aug 16, 2022Updated 3 years ago
- โ12Jul 6, 2023Updated 2 years ago
- Data Structures with Python(AIX20001) ๊ฐ์ ์๋ฃ์คโ18Jun 14, 2024Updated last year
- โ12Dec 13, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off โข AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- โ15Nov 20, 2025Updated 5 months ago
- โ11Jun 19, 2022Updated 3 years ago
- https://arxiv.org/abs/2404.10917โ14Mar 18, 2025Updated last year
- LEMON: Explainable Entity Matchingโ19Apr 6, 2022Updated 4 years ago
- Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.โ16Sep 25, 2024Updated last year
- Few-shot Learning with Auxiliary Dataโ31Dec 8, 2023Updated 2 years ago
- The LM Contamination Index is a manually created database of contamination evidences for LMs.โ82Apr 11, 2024Updated 2 years ago
- โ11Jun 2, 2022Updated 3 years ago
- [COLING 2025] NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Modelsโ18Jan 18, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways โข AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- โ๏ธ Sentence segmentation with wtpsplit's state-of-the-art Segment any Text (SaT) modelsโ38Updated this week
- A scalable implementation of diffusion and flow-matching with XGBoost models, applied to calorimeter data.โ21Mar 23, 2026Updated last month
- โ19Jun 9, 2025Updated 10 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laโฆโ49Nov 13, 2023Updated 2 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyerโ54Nov 21, 2022Updated 3 years ago
- A framework to train language models to learn invariant representations.โ14Jan 24, 2022Updated 4 years ago
- Scaling Sparse Fine-Tuning to Large Language Modelsโ19Jan 31, 2024Updated 2 years ago