Reverse Instructions to generate instruction tuning data with corpus examples
☆215Mar 5, 2024Updated last year
Alternatives and similar repositories for LongForm
Users that are interested in LongForm are comparing it to the libraries listed below
Sorting:
- Alpaca dataset from Stanford, cleaned and curated☆1,582Apr 14, 2023Updated 2 years ago
- RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best …☆10Nov 3, 2023Updated 2 years ago
- Reimplementation of the task generation part from the Alpaca paper☆119Apr 4, 2023Updated 2 years ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆45Oct 1, 2025Updated 5 months ago
- Easily deploy your rwkv model☆19May 5, 2023Updated 2 years ago
- Advanced Reasoning Benchmark Dataset for LLMs☆47Nov 19, 2023Updated 2 years ago
- Instruction Tuning with GPT-4☆4,342Jun 11, 2023Updated 2 years ago
- ☆1,560Feb 20, 2026Updated last week
- LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions☆823May 6, 2023Updated 2 years ago
- Trying to deconstruct RWKV in understandable terms☆14May 6, 2023Updated 2 years ago
- Search the biomedical literature for protein interactions and protein associations☆11Nov 24, 2023Updated 2 years ago
- ☆313Jun 9, 2024Updated last year
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆30Jan 25, 2023Updated 3 years ago
- RWKV godot interface module☆61Jun 13, 2024Updated last year
- ☆180Feb 23, 2023Updated 3 years ago
- data cleaning and curation for unstructured text☆329Aug 6, 2024Updated last year
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆27Apr 21, 2023Updated 2 years ago
- Exploring finetuning public checkpoints on filter 8K sequences on Pile☆115Mar 22, 2023Updated 2 years ago
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆116Jun 28, 2025Updated 8 months ago
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆552Mar 10, 2024Updated last year
- Generate textbook-quality synthetic LLM pretraining data☆509Oct 19, 2023Updated 2 years ago
- Node.js implementation binding for the RWKV.cpp module☆21Aug 2, 2023Updated 2 years ago
- An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"☆52Aug 13, 2023Updated 2 years ago
- Scaling Data-Constrained Language Models☆342Jun 28, 2025Updated 8 months ago
- Few-shot Learning with Auxiliary Data☆31Dec 8, 2023Updated 2 years ago
- Aligning pretrained language models with instruction data generated by themselves.☆4,580Mar 27, 2023Updated 2 years ago
- A collection of modular datasets generated by GPT-4, General-Instruct - Roleplay-Instruct - Code-Instruct - and Toolformer☆1,630Sep 15, 2023Updated 2 years ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆486Mar 19, 2024Updated last year
- Knowledge Infused Decoding☆71Dec 31, 2023Updated 2 years ago
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,741Jan 8, 2024Updated 2 years ago
- Dromedary: towards helpful, ethical and reliable LLMs.☆1,144Sep 18, 2025Updated 5 months ago
- Long-context pretrained encoder-decoder models☆96Oct 28, 2022Updated 3 years ago
- [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters☆5,936Mar 14, 2024Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆137Mar 14, 2024Updated last year
- Robust recipes to align language models with human and AI preferences☆5,506Sep 8, 2025Updated 5 months ago
- Simple Questions Generate Named Entity Recognition Datasets (EMNLP 2022)☆76Apr 10, 2023Updated 2 years ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆81Jan 18, 2024Updated 2 years ago
- LLMs as Collaboratively Edited Knowledge Bases☆46Feb 8, 2026Updated 3 weeks ago
- Tools for content datamining and NLP at scale☆44Jun 20, 2024Updated last year