mlabonne / llm-datasets
High-quality datasets, tools, and concepts for LLM fine-tuning.
☆1,965Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for llm-datasets
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆1,609Updated this week
- PyTorch native finetuning library☆4,267Updated this week
- ☆2,732Updated last month
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,026Updated last week
- ☆1,260Updated this week
- A reading list on LLM based Synthetic Data Generation 🔥☆761Updated this week
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.☆1,797Updated last week
- LLM Finetuning with peft☆2,145Updated 4 months ago
- Tools for merging pretrained large language models.☆4,788Updated this week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,031Updated 2 months ago
- [NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge a…☆1,351Updated 3 months ago
- Optimizing inference proxy for LLMs☆1,329Updated this week
- ReFT: Representation Finetuning for Language Models☆1,145Updated this week
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,548Updated 2 months ago
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!☆3,207Updated 2 months ago
- A quick guide (especially) for trending instruction finetuning datasets☆2,603Updated 11 months ago
- ☆522Updated last week
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆764Updated this week
- Training LLMs with QLoRA + FSDP☆1,419Updated this week
- A curated list of Large Language Model (LLM) Interpretability resources.☆1,136Updated 3 months ago
- ☆1,878Updated last week
- ☆903Updated this week
- Collection of notebook guides created by the Brev.dev team!☆1,660Updated last week
- Summarize existing representative LLMs text datasets.☆982Updated 2 months ago
- Evaluate your LLM's response with Prometheus and GPT4 💯☆792Updated 2 months ago
- Robust recipes to align language models with human and AI preferences☆4,663Updated last month
- A native PyTorch Library for large model training☆2,566Updated this week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,035Updated this week
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,333Updated 6 months ago
- Open-source AI cookbook☆1,667Updated this week