Datasets for Instruction Tuning of Large Language Models
☆261Nov 30, 2023Updated 2 years ago
Alternatives and similar repositories for instruction-datasets
Users that are interested in instruction-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)☆1,148Jan 4, 2024Updated 2 years ago
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models☆23Jul 27, 2024Updated last year
- ☆12Apr 25, 2022Updated 4 years ago
- Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks☆210Jan 13, 2024Updated 2 years ago
- A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。☆730Apr 7, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Benchmarking large language models' complex reasoning ability with chain-of-thought prompting☆2,771Aug 4, 2024Updated last year
- A curated list of awesome instruction tuning datasets, models, papers and repositories.☆347Jun 12, 2023Updated 2 years ago
- Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).☆767Jul 20, 2023Updated 2 years ago
- Papers and Datasets on Instruction Tuning and Following. ✨✨✨☆511Apr 4, 2024Updated 2 years ago
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆553Mar 10, 2024Updated 2 years ago
- Instruction Tuning with GPT-4☆4,337Jun 11, 2023Updated 2 years ago
- Alpaca dataset from Stanford, cleaned and curated☆1,596Mar 7, 2026Updated 2 months ago
- Expanding natural instructions☆1,044Dec 11, 2023Updated 2 years ago
- EMNLP'2023: Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration☆36Mar 10, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ICML'2022: Black-Box Tuning for Language-Model-as-a-Service & EMNLP'2022: BBTv2: Towards a Gradient-Free Future with Large Language Model…☆272Nov 8, 2022Updated 3 years ago
- The implementation of "Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Deco…☆38Aug 29, 2025Updated 8 months ago
- Open Academic Research on Improving LLaMA to SOTA LLM☆1,607Aug 30, 2023Updated 2 years ago
- OPD: Chinese Open-Domain Pre-trained Dialogue Model☆73Jun 5, 2023Updated 2 years ago
- ☆1,271Jul 30, 2024Updated last year
- Our paper is titled "NUS-IDS at FinCausal 2021: Dependency Tree in Graph Neural Networks for better Cause-Effect Span Detection".☆13Feb 11, 2022Updated 4 years ago
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Jan 27, 2023Updated 3 years ago
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆476Mar 7, 2024Updated 2 years ago
- An experimental desktop client for using Claude Desktop's MCP with Novelcrafter codices.☆11Dec 3, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Aligning pretrained language models with instruction data generated by themselves.☆4,594Mar 27, 2023Updated 3 years ago
- Hello world demonstration for Weblate☆14Jan 20, 2026Updated 3 months ago
- ☆37May 31, 2023Updated 2 years ago
- Code for embedding and retrieval research.☆16Oct 24, 2023Updated 2 years ago
- This repo contains codes and instructions for baselines in the VLUE benchmark.☆41Jul 16, 2022Updated 3 years ago
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆128Jul 26, 2023Updated 2 years ago
- AllenAI's post-training codebase☆3,708Updated this week
- ☆98Jun 6, 2022Updated 3 years ago
- Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)☆2,833Mar 13, 2024Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- All-in-one text de-duplication☆756Mar 9, 2026Updated 2 months ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆953Mar 19, 2025Updated last year
- A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.☆390Oct 4, 2023Updated 2 years ago
- Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources☆33Jul 15, 2022Updated 3 years ago
- ☆1,562Apr 18, 2026Updated 3 weeks ago
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Jun 11, 2023Updated 2 years ago
- Feeling confused about super alignment? Here is a reading list☆43Jan 9, 2024Updated 2 years ago