google-research-datasets / QAmeleonView external linksLinks
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning PaLM with only five examples per language. We use the synthetic data to finetune downstream QA models leading to improved accuracy in comparison to English-only and translation-based baselines.
β35Aug 15, 2023Updated 2 years ago
Alternatives and similar repositories for QAmeleon
Users that are interested in QAmeleon are comparing it to the libraries listed below
Sorting:
- JAX Scalify: end-to-end scaled arithmeticsβ18Oct 30, 2024Updated last year
- π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated 10 months ago
- Submission archive for the MS MARCO passage ranking leaderboardβ13Apr 21, 2023Updated 2 years ago
- Gzip and nearest neighbors for text classificationβ57Aug 1, 2023Updated 2 years ago
- π’ Data Toolkit for Sailor Language Modelsβ96Feb 24, 2025Updated 11 months ago
- Can LLMs generate code-mixed sentences through zero-shot prompting?β11Apr 18, 2023Updated 2 years ago
- Library of models for Protein Function prediction (part of the 18th top solution out of 1625 teams in CAFA5)β20May 23, 2025Updated 8 months ago
- β52Jul 20, 2025Updated 6 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β49Nov 13, 2023Updated 2 years ago
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 qβ¦β89Feb 27, 2024Updated last year
- The paper list of multilingual pre-trained models (Continual Updated).β24Jun 18, 2024Updated last year
- β30Jul 5, 2023Updated 2 years ago
- β24Oct 23, 2020Updated 5 years ago
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradientsβ26Sep 10, 2024Updated last year
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learningβ30Jan 25, 2023Updated 3 years ago
- Data Structures with Python(AIX20001) κ°μ μλ£μ€β18Jun 14, 2024Updated last year
- A collection of utilities for handling IPA phones.β26Sep 24, 2023Updated 2 years ago
- Few-shot Learning with Auxiliary Dataβ31Dec 8, 2023Updated 2 years ago
- β28Feb 27, 2025Updated 11 months ago
- Official implementation of ViewFusion: Learning Composable Diffusion Models for Novel View Synthesisβ36May 30, 2025Updated 8 months ago
- Code for the ACL 2023 paper: "Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scβ¦β35Sep 16, 2023Updated 2 years ago
- AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external wβ¦β31Jan 14, 2023Updated 3 years ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"β89Oct 30, 2024Updated last year
- Official code and model checkpoints for our EMNLP 2022 paper "RankGen - Improving Text Generation with Large Ranking Models" (https://arxβ¦β137Aug 2, 2023Updated 2 years ago
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuningβ35Aug 9, 2023Updated 2 years ago
- mReasoner is a unified computational implementation of the model theory of thinking and reasoningβ13Aug 17, 2023Updated 2 years ago
- Repository of IPBenchβ19Jan 4, 2026Updated last month
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Modelsβ¦β40Feb 5, 2024Updated 2 years ago
- β91Aug 18, 2024Updated last year
- Understanding how features learned by neural networks evolve throughout trainingβ39Oct 24, 2024Updated last year
- This project studies the performance and robustness of language models and task-adaptation methods.β155May 18, 2024Updated last year
- [NeurIPS 2025] Let LRMs Break Free from Overthinking via Self-Braking Tuning. https://arxiv.org/abs/2505.14604β55Nov 4, 2025Updated 3 months ago
- DPO, but faster πβ47Dec 6, 2024Updated last year
- β161Apr 17, 2024Updated last year
- The original implementation of Min et al. "Nonparametric Masked Language Modeling" (paper https//arxiv.org/abs/2212.01349)β158Jan 6, 2023Updated 3 years ago
- Embedding Recycling for Language modelsβ38Jul 11, 2023Updated 2 years ago
- Backtracing: Retrieving the Cause of the Query, EACL 2024 Long Paper, Findings.β92Jul 21, 2024Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.β96Feb 9, 2023Updated 3 years ago
- GBM implementation on Legateβ14Jan 28, 2026Updated 2 weeks ago