QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning PaLM with only five examples per language. We use the synthetic data to finetune downstream QA models leading to improved accuracy in comparison to English-only and translation-based baselines.
โ35Aug 15, 2023Updated 2 years ago
Alternatives and similar repositories for QAmeleon
Users that are interested in QAmeleon are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ๐ Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentโ11Apr 6, 2025Updated 11 months ago
- Ukrainian ELECTRA modelโ12Mar 11, 2023Updated 3 years ago
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learningโ30Jan 25, 2023Updated 3 years ago
- ๐ข Data Toolkit for Sailor Language Modelsโ96Feb 24, 2025Updated last year
- โ54Mar 18, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways โข AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A library for language transfer methods and algorithms.โ16Feb 6, 2026Updated last month
- suffix array construction and searching algorithms for in-memory binary data.โ12Sep 10, 2022Updated 3 years ago
- From Hero to Zรฉroe: A Benchmark of Low-Level Adversarial Attacksโ14Feb 23, 2023Updated 3 years ago
- Python module to remove wiki markup text.โ10Jan 15, 2016Updated 10 years ago
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 qโฆโ89Feb 27, 2024Updated 2 years ago
- โ21Nov 20, 2020Updated 5 years ago
- A part-of-speech tagger with support for domain adaptation and external resources.โ24Oct 26, 2022Updated 3 years ago
- Submission archive for the MS MARCO passage ranking leaderboardโ13Apr 21, 2023Updated 2 years ago
- Non Metric Space ( Approximate ) Library in Rโ12Feb 2, 2023Updated 3 years ago
- Proton VPN Special Offer - Get 70% off โข AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Official repository of the paper MPMQA: Multimodal Question Answering on Product Manuals (AAAI 2023)โ19Nov 28, 2022Updated 3 years ago
- ๅบไบไธญๅฟๅบฆ็ไธญๆๅ ณ้ฎ็ญ่ฏญๆฝๅๅทฅๅ ทโ11Sep 2, 2022Updated 3 years ago
- EWoK dataset generation frameworkโ10May 14, 2024Updated last year
- โ12Jul 6, 2023Updated 2 years ago
- Data Structures with Python(AIX20001) ๊ฐ์ ์๋ฃ์คโ18Jun 14, 2024Updated last year
- โ12Dec 13, 2022Updated 3 years ago
- โ24Oct 23, 2020Updated 5 years ago
- Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.โ16Sep 25, 2024Updated last year
- Few-shot Learning with Auxiliary Dataโ31Dec 8, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform โข AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradientsโ27Sep 10, 2024Updated last year
- KG data for ODAโ12Sep 21, 2024Updated last year
- Python package to augment multilingual dataโ15Feb 15, 2023Updated 3 years ago
- โ๏ธ Sentence segmentation with wtpsplit's state-of-the-art Segment any Text (SaT) modelsโ38Oct 1, 2025Updated 5 months ago
- A scalable implementation of diffusion and flow-matching with XGBoost models, applied to calorimeter data.โ19Updated this week
- [COLING 2025] NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Modelsโ18Jan 18, 2025Updated last year
- โ19Jun 9, 2025Updated 9 months ago
- A powerful text cleaner for Japanese web textsโ12Jan 20, 2024Updated 2 years ago
- SIGIR 2023 tutorial on cross language information retrieval.โ13Feb 28, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean โข AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laโฆโ49Nov 13, 2023Updated 2 years ago
- A framework to train language models to learn invariant representations.โ14Jan 24, 2022Updated 4 years ago
- Repository for "Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages"โ15Oct 4, 2024Updated last year
- A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.โ32Mar 21, 2023Updated 3 years ago
- Rust binding to crfsuiteโ25Jan 31, 2026Updated last month
- Forcing Diffuse Distributions out of Language Modelsโ18Sep 10, 2024Updated last year
- A library for evaluation of Grammatical Error Correction (GEC). Accepted to ACL'25 Demo: "gec-metrics: A Unified Library for Grammatical โฆโ14Jan 25, 2026Updated 2 months ago