Datasets for Instruction Tuning of Large Language Models
☆261Nov 30, 2023Updated 2 years ago
Alternatives and similar repositories for instruction-datasets
Users that are interested in instruction-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)☆1,146Jan 4, 2024Updated 2 years ago
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models☆23Jul 27, 2024Updated last year
- ☆12Apr 25, 2022Updated 3 years ago
- Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks☆209Jan 13, 2024Updated 2 years ago
- A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。☆726Apr 7, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Benchmarking large language models' complex reasoning ability with chain-of-thought prompting☆2,770Aug 4, 2024Updated last year
- A curated list of awesome instruction tuning datasets, models, papers and repositories.☆347Jun 12, 2023Updated 2 years ago
- Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).☆767Jul 20, 2023Updated 2 years ago
- Papers and Datasets on Instruction Tuning and Following. ✨✨✨☆509Apr 4, 2024Updated last year
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆552Mar 10, 2024Updated 2 years ago
- Instruction Tuning with GPT-4☆4,337Jun 11, 2023Updated 2 years ago
- Alpaca dataset from Stanford, cleaned and curated☆1,584Mar 7, 2026Updated 3 weeks ago
- Expanding natural instructions☆1,038Dec 11, 2023Updated 2 years ago
- EMNLP'2023: Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration☆36Mar 10, 2024Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ICML'2022: Black-Box Tuning for Language-Model-as-a-Service & EMNLP'2022: BBTv2: Towards a Gradient-Free Future with Large Language Model…☆271Nov 8, 2022Updated 3 years ago
- The implementation of "Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Deco…☆38Aug 29, 2025Updated 7 months ago
- Open Academic Research on Improving LLaMA to SOTA LLM☆1,610Aug 30, 2023Updated 2 years ago
- OPD: Chinese Open-Domain Pre-trained Dialogue Model☆74Jun 5, 2023Updated 2 years ago
- ☆1,264Jul 30, 2024Updated last year
- Our paper is titled "NUS-IDS at FinCausal 2021: Dependency Tree in Graph Neural Networks for better Cause-Effect Span Detection".☆13Feb 11, 2022Updated 4 years ago
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Jan 27, 2023Updated 3 years ago
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆476Mar 7, 2024Updated 2 years ago
- An experimental desktop client for using Claude Desktop's MCP with Novelcrafter codices.☆10Dec 3, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Aligning pretrained language models with instruction data generated by themselves.☆4,589Mar 27, 2023Updated 3 years ago
- Hello world demonstration for Weblate☆14Jan 20, 2026Updated 2 months ago
- ☆37May 31, 2023Updated 2 years ago
- Code for embedding and retrieval research.☆16Oct 24, 2023Updated 2 years ago
- This repo contains codes and instructions for baselines in the VLUE benchmark.☆41Jul 16, 2022Updated 3 years ago
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆129Jul 26, 2023Updated 2 years ago
- AllenAI's post-training codebase☆3,643Mar 23, 2026Updated last week
- ☆98Jun 6, 2022Updated 3 years ago
- Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)☆2,812Mar 13, 2024Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- All-in-one text de-duplication☆749Mar 9, 2026Updated 2 weeks ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆954Mar 19, 2025Updated last year
- A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.☆387Oct 4, 2023Updated 2 years ago
- Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources☆33Jul 15, 2022Updated 3 years ago
- ☆1,559Updated this week
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Jun 11, 2023Updated 2 years ago
- Feeling confused about super alignment? Here is a reading list☆44Jan 9, 2024Updated 2 years ago