mlabonne / llm-datasetsLinks
Curated list of datasets and tools for post-training.
☆4,041Updated 3 weeks ago
Alternatives and similar repositories for llm-datasets
Users that are interested in llm-datasets are comparing it to the libraries listed below
Sorting:
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,956Updated last week
- A reading list on LLM based Synthetic Data Generation 🔥☆1,484Updated 6 months ago
- AllenAI's post-training codebase☆3,395Updated this week
- A quick guide (especially) for trending instruction finetuning datasets☆3,319Updated 2 years ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,162Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,552Updated 6 months ago
- Summarize existing representative LLMs text datasets.☆1,395Updated last month
- ☆1,323Updated 9 months ago
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆1,984Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,757Updated last week
- Tools for merging pretrained large language models.☆6,533Updated last week
- Robust recipes to align language models with human and AI preferences☆5,439Updated 3 months ago
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,542Updated 10 months ago
- Awesome Reasoning LLM Tutorial/Survey/Guide☆2,188Updated last month
- PyTorch native post-training library☆5,608Updated this week
- Synthetic data curation for post-training and structured data extraction☆1,564Updated 4 months ago
- ☆692Updated 7 months ago
- Open-source AI cookbook