mlabonne / llm-datasets
High-quality datasets, tools, and concepts for LLM fine-tuning.
☆2,010Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for llm-datasets
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆1,634Updated this week
- A reading list on LLM based Synthetic Data Generation 🔥☆791Updated 2 weeks ago
- ☆1,271Updated 2 weeks ago
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆798Updated 2 weeks ago
- LLM Finetuning with peft☆2,164Updated 4 months ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,045Updated this week
- ☆2,746Updated 2 months ago
- ☆532Updated 3 weeks ago
- nanoGPT style version of Llama 3.1☆1,246Updated 3 months ago
- Summarize existing representative LLMs text datasets.☆1,008Updated 2 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,565Updated 3 months ago
- Tools for merging pretrained large language models.☆4,816Updated 2 weeks ago
- ReFT: Representation Finetuning for Language Models☆1,159Updated 2 weeks ago
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.☆1,824Updated 2 weeks ago
- DataComp for Language Models☆1,157Updated this week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,057Updated 2 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,095Updated last week
- Optimizing inference proxy for LLMs☆1,563Updated this week
- ☆935Updated 2 weeks ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆811Updated this week
- PyTorch native finetuning library☆4,336Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!☆3,256Updated 3 months ago
- A curated list of Large Language Model (LLM) Interpretability resources.☆1,149Updated 3 months ago
- Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜☆890Updated 2 months ago
- [NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge a…☆1,378Updated 3 months ago
- Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'☆1,296Updated last month
- Evaluate your LLM's response with Prometheus and GPT4 💯☆797Updated 2 months ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆2,205Updated this week
- AdalFlow: The library to build & auto-optimize LLM applications.☆2,074Updated this week
- ⚡FlashRAG: A Python Toolkit for Efficient RAG Research☆1,335Updated this week