VikParuchuri/textbook_quality

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/VikParuchuri/textbook_quality)

VikParuchuri / textbook_quality

Generate textbook-quality synthetic LLM pretraining data

☆508

Alternatives and similar repositories for textbook_quality

Users that are interested in textbook_quality are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

VikParuchuri / libgen_to_txt
View on GitHub
Convert all of libgen to high quality markdown
☆253Dec 13, 2023Updated 2 years ago
VikParuchuri / classified
View on GitHub
Score LLM pretraining data with classifiers
☆54Nov 2, 2023Updated 2 years ago
SciPhi-AI / library-of-phi
View on GitHub
☆182Oct 13, 2023Updated 2 years ago
SciPhi-AI / synthesizer
View on GitHub
A multi-purpose LLM framework for RAG and data creation.
☆625Jan 13, 2024Updated 2 years ago
jondurbin / airoboros
View on GitHub
Customizable implementation of the self-instruct paper.
☆1,051Mar 7, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
ChrisHayduk / QLoRA-for-MLM
View on GitHub
QLoRA for Masked Language Modeling
☆23Sep 11, 2023Updated 2 years ago
taylorai / galactic
View on GitHub
data cleaning and curation for unstructured text
☆329Aug 6, 2024Updated last year
databricks / lilac
View on GitHub
Curate better data for LLMs
☆1,072Mar 19, 2024Updated 2 years ago
abacaj / fine-tune-mistral
View on GitHub
Fine-tune mistral-7B on 3090s, a100s, h100s
☆735Oct 11, 2023Updated 2 years ago
sabetAI / BLoRA
View on GitHub
batched loras
☆350Sep 6, 2023Updated 2 years ago
huggingface / cosmopedia
View on GitHub
☆572Nov 20, 2024Updated last year
imoneoi / multipack
View on GitHub
Multipack distributed sampler for fast padding-free training of LLMs
☆207Aug 10, 2024Updated last year
jondurbin / bagel
View on GitHub
A bagel, with everything.
☆326Apr 11, 2024Updated 2 years ago
euclaise / SlimTrainer
View on GitHub
Full finetuning of large language models without large memory requirements
☆92Sep 22, 2025Updated 10 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AblateIt / finetune-study
View on GitHub
Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.
☆82Sep 10, 2023Updated 2 years ago
huggingface / alignment-handbook
View on GitHub
Robust recipes to align language models with human and AI preferences
☆5,643May 26, 2026Updated last month
SkunkworksAI / hydra-moe
View on GitHub
☆416Nov 2, 2023Updated 2 years ago
ChrisHayduk / qlora-multi-gpu
View on GitHub
QLoRA with Enhanced Multi GPU Support
☆38Aug 8, 2023Updated 2 years ago
abacaj / train-with-fsdp
View on GitHub
☆93Oct 5, 2023Updated 2 years ago
emrgnt-cmplxty / zero-shot-replication
View on GitHub
☆75Sep 5, 2023Updated 2 years ago
IBM / SALMON
View on GitHub
Self-Alignment with Principle-Following Reward Models
☆170Sep 18, 2025Updated 10 months ago
jquesnelle / yarn
View on GitHub
YaRN: Efficient Context Window Extension of Large Language Models
☆1,740Apr 17, 2024Updated 2 years ago
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,260Jun 17, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
CarperAI / decontamination
View on GitHub
This repository contains code for cleaning your training data of benchmark data to help combat data snooping.
☆28Apr 21, 2023Updated 3 years ago
argilla-io / distilabel
View on GitHub
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆3,344Updated this week
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,220Updated this week
Birch-san / booru-embed
View on GitHub
[WIP] Transformer to embed Danbooru labelsets
☆13Mar 31, 2024Updated 2 years ago
nlpxucan / WizardLM
View on GitHub
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
☆9,480Jun 7, 2025Updated last year
teknium1 / GPTeacher
View on GitHub
A collection of modular datasets generated by GPT-4, General-Instruct - Roleplay-Instruct - Code-Instruct - and Toolformer
☆1,668Sep 15, 2023Updated 2 years ago
theblackcat102 / evol-dataset
View on GitHub
evol augment any dataset online
☆61Aug 3, 2023Updated 2 years ago
lucidrains / self-rewarding-lm-pytorch
View on GitHub
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
☆1,411Apr 11, 2024Updated 2 years ago
axolotl-ai-cloud / axolotl
View on GitHub
Go ahead and axolotl questions
☆12,242Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
uclaml / SPIN
View on GitHub
The official implementation of Self-Play Fine-Tuning (SPIN)
☆1,247May 8, 2024Updated 2 years ago
explosion / curated-transformers
View on GitHub
🤖 A PyTorch library of curated Transformer models and their composable components
☆892Apr 17, 2024Updated 2 years ago
migtissera / Sensei
View on GitHub
Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI
☆221Apr 29, 2024Updated 2 years ago
NousResearch / Open-Reasoning-Tasks
View on GitHub
A comprehensive repository of reasoning tasks for LLMs (and beyond)
☆497Sep 27, 2024Updated last year
allenai / dolma
View on GitHub
Data and tools for generating and inspecting OLMo pre-training data.
☆1,527Nov 5, 2025Updated 8 months ago
QuixiAI / SystemChat
View on GitHub
☆31Jul 5, 2024Updated 2 years ago
Locutusque / TPU-Alignment
View on GitHub
Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free
☆234Oct 31, 2024Updated last year