Open source project for data preparation for GenAI applications
☆911Mar 13, 2026Updated last week
Alternatives and similar repositories for data-prep-kit
Users that are interested in data-prep-kit are comparing it to the libraries listed below
Sorting:
- Build document-native LLM applications☆57Sep 11, 2024Updated last year
- A web app for rapidly prototyping AI agents and the lightweight web UIs that wrap them—build flows, preview interactions, and share agent…☆63Updated this week
- Docling core data types and transformations☆234Mar 13, 2026Updated last week
- Build production-ready AI agents in both Python and Typescript.☆3,168Mar 1, 2026Updated 2 weeks ago
- Granite Snack Cookbook -- easily consumable recipes (python notebooks) that showcase the capabilities of the Granite models☆351Mar 10, 2026Updated last week
- Docling Haystack integration☆28Jan 13, 2025Updated last year
- The official Python SDK for Codellm-Devkit☆16Mar 5, 2026Updated 2 weeks ago
- InstructLab Core package. Use this to chat with a model and execute the InstructLab workflow to train a model using custom taxonomy data…☆1,410Feb 16, 2026Updated last month
- Get your documents ready for gen AI☆55,944Updated this week
- Taxonomy tree that will allow you to create models tuned with your data☆292Sep 8, 2025Updated 6 months ago
- ☆192Mar 9, 2026Updated last week
- Granite Code Models: A Family of Open Foundation Models for Code Intelligence☆1,248Jun 25, 2025Updated 8 months ago
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆56Mar 9, 2026Updated last week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,956Updated this week
- The living Trust and Safety User Guide for the AI Alliance (https://thealliance.ai)☆15Feb 15, 2026Updated last month
- Deploy, and share agents with open infrastructure, free from vendor lock-in.☆1,044Updated this week
- Estimate resources needed to train LLMs☆14Feb 10, 2026Updated last month
- Evaluation framework for document processing models and services.☆65Mar 11, 2026Updated last week
- Scalable data pre processing and curation toolkit for LLMs☆1,460Updated this week
- A system for agentic LLM-powered data processing and ETL☆3,690Mar 12, 2026Updated last week
- Automation for IBM Watson Deployments☆17Sep 17, 2025Updated 6 months ago
- Interact with the Deep Search platform for new knowledge explorations and discoveries☆222Jan 24, 2025Updated last year
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆211Feb 16, 2026Updated last month
- Community maintained hardware plugin for vLLM on Spyre☆47Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,121Mar 9, 2026Updated last week
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆23Mar 12, 2024Updated 2 years ago
- Running Docling as an API service☆1,340Mar 9, 2026Updated last week
- Fybrik☆132Sep 7, 2025Updated 6 months ago
- FMS Model Optimizer is a framework for developing reduced precision neural network models.☆21Updated this week
- Chat with your website using LLMs☆79Feb 19, 2026Updated last month
- Making docling agentic through MCP☆515Updated this week
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,282Updated this week
- LM engine is a library for pretraining/finetuning LLMs☆136Updated this week
- Simple package to extract text with coordinates from programmatic PDFs☆256Mar 9, 2026Updated last week
- Docling LangChain integration☆66Nov 17, 2025Updated 4 months ago
- Examples and guides for building Gen AI applications on the watsonx platform.☆45Feb 9, 2026Updated last month
- Observability Volume Management☆41Mar 19, 2025Updated last year
- ☆271Jun 25, 2025Updated 8 months ago
- Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.☆234Sep 19, 2023Updated 2 years ago