Open source project for data preparation for GenAI applications
☆903Feb 16, 2026Updated last week
Alternatives and similar repositories for data-prep-kit
Users that are interested in data-prep-kit are comparing it to the libraries listed below
Sorting:
- Build document-native LLM applications☆56Sep 11, 2024Updated last year
- Build production-ready AI agents in both Python and Typescript.☆3,119Feb 20, 2026Updated last week
- Deploy, and share agents with open infrastructure, free from vendor lock-in.☆989Feb 20, 2026Updated last week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,903Updated this week
- Docling Haystack integration☆27Jan 13, 2025Updated last year
- Get your documents ready for gen AI☆54,094Updated this week
- Docling core data types and transformations☆228Feb 20, 2026Updated last week
- InstructLab Core package. Use this to chat with a model and execute the InstructLab workflow to train a model using custom taxonomy data…☆1,409Feb 16, 2026Updated last week
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆56Feb 2, 2026Updated 3 weeks ago
- Taxonomy tree that will allow you to create models tuned with your data☆291Sep 8, 2025Updated 5 months ago
- A system for agentic LLM-powered data processing and ETL☆3,636Feb 2, 2026Updated 3 weeks ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,100Feb 16, 2026Updated last week
- Evaluation framework for document processing models and services.☆63Feb 12, 2026Updated 2 weeks ago
- Granite Snack Cookbook -- easily consumable recipes (python notebooks) that showcase the capabilities of the Granite models☆350Feb 13, 2026Updated 2 weeks ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆23Mar 12, 2024Updated last year
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆211Feb 16, 2026Updated last week
- A web app for rapidly prototyping AI agents and the lightweight web UIs that wrap them—build flows, preview interactions, and share agent…☆62Feb 10, 2026Updated 2 weeks ago
- The official Python SDK for Codellm-Devkit☆17Feb 16, 2026Updated last week
- Scalable data pre processing and curation toolkit for LLMs☆1,409Feb 21, 2026Updated last week
- ☆187Feb 20, 2026Updated last week
- Granite Code Models: A Family of Open Foundation Models for Code Intelligence☆1,243Jun 25, 2025Updated 8 months ago
- LM engine is a library for pretraining/finetuning LLMs☆118Updated this week
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,011Feb 20, 2026Updated last week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,311Feb 20, 2026Updated last week
- Curated list of datasets and tools for post-training.☆4,265Nov 10, 2025Updated 3 months ago
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆37,083Updated this week
- Estimate resources needed to train LLMs☆14Feb 10, 2026Updated 2 weeks ago
- AdalFlow: The library to build & auto-optimize LLM applications.☆4,049Feb 10, 2026Updated 2 weeks ago
- Knowledge Agents and Management in the Cloud☆4,235Feb 17, 2026Updated last week
- AI Observability & Evaluation☆8,666Updated this week
- Composable building blocks to build LLM Apps☆8,275Updated this week
- Prompt Declaration Language (PDL) is a declarative prompt programming language.☆283Updated this week
- The Granite Guardian models are designed to detect risks in prompts and responses.☆133Oct 8, 2025Updated 4 months ago
- Interact with the Deep Search platform for new knowledge explorations and discoveries☆220Jan 24, 2025Updated last year
- ☆162Dec 2, 2024Updated last year
- This project makes running the InstructLab large language model (LLM) fine-tuning process easy and flexible on OpenShift☆27Aug 27, 2025Updated 6 months ago
- Structured Outputs☆13,456Feb 13, 2026Updated 2 weeks ago
- SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.☆7,693Nov 7, 2025Updated 3 months ago
- DSPy: The framework for programming—not prompting—language models☆32,381Updated this week