mlabonne / llm-datasetsLinks

Curated list of datasets and tools for post-training.

☆3,261

Alternatives and similar repositories for llm-datasets

Users that are interested in llm-datasets are comparing it to the libraries listed below

Sorting:

argilla-io / distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆2,806Updated this week
huggingface / evaluation-guidebook
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…
☆1,465Updated 6 months ago
Zjh-819 / LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
☆3,176Updated last year
allenai / open-instruct
AllenAI's post-training codebase
☆3,061Updated this week
arcee-ai / mergekit
Tools for merging pretrained large language models.
☆6,016Updated 3 weeks ago
huggingface / lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
☆1,722Updated last week
wasiahmad / Awesome-LLM-Synthetic-Data
A reading list on LLM based Synthetic Data Generation 🔥
☆1,338Updated last month
philschmid / deep-learning-pytorch-huggingface
☆1,247Updated 4 months ago
pytorch / torchtune
PyTorch native post-training library
☆5,323Updated last week
huggingface / datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆2,473Updated this week
gkamradt / LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
☆1,934Updated 11 months ago
huggingface / alignment-handbook
Robust recipes to align language models with human and AI preferences
☆5,260Updated last week
ashishpatel26 / LLM-Finetuning
LLM Finetuning with peft
☆2,565Updated 5 months ago
stanfordnlp / pyreft
Stanford NLP Python library for Representation Finetuning (ReFT)
☆1,495Updated 5 months ago
zou-group / textgrad
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
☆2,741Updated last week
mistralai / mistral-finetune
☆2,984Updated 10 months ago
mistralai / cookbook
☆1,902Updated last week
predibase / lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
☆3,274Updated last month
lmmlzn / Awesome-LLMs-Datasets
Summarize existing representative LLMs text datasets.
☆1,311Updated 3 months ago
axolotl-ai-cloud / axolotl
Go ahead and axolotl questions
☆9,937Updated this week
tencent-ailab / persona-hub
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
☆1,230Updated 4 months ago
maitrix-org / llm-reasoners
A library for advanced large language model reasoning
☆2,182Updated last month
huggingface / cookbook
Open-source AI cookbook
☆2,145Updated last week
bespokelabsai / curator
Synthetic data curation for post-training and structured data extraction
☆1,446Updated last week
huggingface / nanotron
Minimalistic large language model 3D-parallelism training
☆2,034Updated last week
trotsky1997 / MathBlackBox
☆1,027Updated 7 months ago
prometheus-eval / prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
☆963Updated 2 months ago
JShollaj / awesome-llm-interpretability
A curated list of Large Language Model (LLM) Interpretability resources.
☆1,378Updated 3 weeks ago
facebookresearch / large_concept_model
Large Concept Models: Language modeling in a sentence representation space
☆2,246Updated 5 months ago
huggingface / smol-course
A course on aligning smol models.
☆6,026Updated 2 weeks ago