mlabonne / llm-datasetsLinks
Curated list of datasets and tools for post-training.
☆4,124Updated last month
Alternatives and similar repositories for llm-datasets
Users that are interested in llm-datasets are comparing it to the libraries listed below
Sorting:
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,008Updated last week
- A reading list on LLM based Synthetic Data Generation 🔥☆1,498Updated 6 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,226Updated 2 weeks ago
- ☆1,334Updated 10 months ago
- Tools for merging pretrained large language models.☆6,647Updated 2 weeks ago
- AllenAI's post-training codebase☆3,488Updated this week
- Open-source AI cookbook☆2,558Updated last month
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,795Updated last week
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,548Updated this week
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆2,019Updated 3 weeks ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,652Updated 7 months ago
- PyTorch native post-training library☆5,639Updated this week
- LLM Finetuning with peft☆2,740Updated 5 months ago
- ☆694Updated 8 months ago
- Robust recipes to align language models with human and AI preferences☆5,466Updated 3 months ago
- A quick guide (especially) for trending instruction finetuning datasets☆3,334Updated 2 years ago
- A library for advanced large language model reasoning☆2,319Updated 6 months ago
- A course on aligning smol models.☆6,561Updated last month
- Summarize existing representative LLMs text datasets.☆1,412Updated 2 months ago
- ☆2,132Updated 2 weeks ago
- Synthetic data curation for post-training and structured data extraction☆1,591Updated this week
- Explore a comprehensive collection of resources, tutorials, papers, tools, and best practices for fine-tuning Large Language Models (LLMs…☆497Updated last year
- Minimalistic large language model 3D-parallelism training☆2,396Updated 3 weeks ago
- ☆3,054Updated last month
- Best practices for distilling large language models.☆596Updated last year
- DataComp for Language Models☆1,403Updated 3 months ago
- ☆4,249Updated 5 months ago
- Optimizing inference proxy for LLMs☆3,252Updated last week
- Evaluate your LLM's response with Prometheus and GPT4 💯☆1,024Updated 8 months ago
- Tool for generating high quality Synthetic datasets☆1,450Updated 2 months ago