google-research-datasets / QAmeleonLinks
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning PaLM with only five examples per language. We use the synthetic data to finetune downstream QA models leading to improved accuracy in comparison to English-only and translation-based baselines.
☆34Updated last year
Alternatives and similar repositories for QAmeleon
Users that are interested in QAmeleon are comparing it to the libraries listed below
Sorting:
- Embedding Recycling for Language models☆38Updated 2 years ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆27Updated last year
- My explorations into editing the knowledge and memories of an attention network☆35Updated 2 years ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆48Updated last year
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Updated 6 months ago
- Ranking of fine-tuned HF models as base models.☆35Updated 2 months ago
- QLoRA for Masked Language Modeling☆22Updated last year
- Using short models to classify long texts☆21Updated 2 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- ☆44Updated 7 months ago
- A library for squeakily cleaning and filtering language datasets.☆47Updated 2 years ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated last year
- Experiments with generating opensource language model assistants☆97Updated 2 years ago
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆58Updated last year
- ☆14Updated 9 months ago
- 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.☆82Updated 3 years ago
- Exploring finetuning public checkpoints on filter 8K sequences on Pile☆116Updated 2 years ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32Updated last year
- ☆35Updated last year
- ☆32Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- ☆12Updated 7 months ago
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and te…☆42Updated last year
- ☆23Updated 2 years ago
- Experiments for efforts to train a new and improved t5☆76Updated last year
- ☆19Updated last year
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated 2 years ago
- Code for NeurIPS LLM Efficiency Challenge☆59Updated last year
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆39Updated 2 years ago
- ☆68Updated 10 months ago