Open source project for data preparation for GenAI applications
☆928Mar 13, 2026Updated last month
Alternatives and similar repositories for data-prep-kit
Users that are interested in data-prep-kit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Build document-native LLM applications☆58Sep 11, 2024Updated last year
- A web app for rapidly prototyping AI agents and the lightweight web UIs that wrap them—build flows, preview interactions, and share agent…☆64Apr 21, 2026Updated last week
- Docling core data types and transformations☆246Apr 22, 2026Updated last week
- Build production-ready AI agents in both Python and Typescript.☆3,224Updated this week
- Granite Snack Cookbook -- easily consumable recipes (python notebooks) that showcase the capabilities of the Granite models☆368Apr 15, 2026Updated 2 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The official Python SDK for Codellm-Devkit☆16Apr 14, 2026Updated 2 weeks ago
- Docling Haystack integration☆29Apr 9, 2026Updated 3 weeks ago
- InstructLab Core package. Use this to chat with a model and execute the InstructLab workflow to train a model using custom taxonomy data…☆1,415Mar 30, 2026Updated 3 weeks ago
- Get your documents ready for gen AI☆58,638Updated this week
- Taxonomy tree that will allow you to create models tuned with your data☆296Sep 8, 2025Updated 7 months ago
- ☆200Apr 23, 2026Updated last week
- Granite Code Models: A Family of Open Foundation Models for Code Intelligence☆1,249Jun 25, 2025Updated 10 months ago
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆57Apr 22, 2026Updated last week
- The living Trust and Safety User Guide for the AI Alliance (https://thealliance.ai)☆15Apr 6, 2026Updated 3 weeks ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆3,015Apr 20, 2026Updated last week
- Estimate resources needed to train LLMs☆14Feb 10, 2026Updated 2 months ago
- Deploy, and share agents with open infrastructure, free from vendor lock-in.☆1,078Updated this week
- Evaluation framework for document processing models and services.☆70Updated this week
- Scalable data pre processing and curation toolkit for LLMs☆1,538Apr 23, 2026Updated last week
- A system for agentic LLM-powered data processing and ETL☆3,728Mar 27, 2026Updated last month
- Automation for IBM Watson Deployments☆17Sep 17, 2025Updated 7 months ago
- Interact with the Deep Search platform for new knowledge explorations and discoveries☆227Jan 24, 2025Updated last year
- Community maintained hardware plugin for vLLM on Spyre☆50Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,189Apr 20, 2026Updated last week
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆23Mar 12, 2024Updated 2 years ago
- Running Docling as an API service☆1,469Updated this week
- Fybrik☆132Sep 7, 2025Updated 7 months ago
- Simple package to extract text with coordinates from programmatic PDFs☆272Updated this week
- FMS Model Optimizer is a framework for developing reduced precision neural network models.☆21Apr 3, 2026Updated 3 weeks ago
- Chat with your website using LLMs☆79Feb 19, 2026Updated 2 months ago
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,536Apr 20, 2026Updated last week
- Docling LangChain integration☆69Nov 17, 2025Updated 5 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Making docling agentic through MCP☆591Apr 21, 2026Updated last week
- LM engine is a library for pretraining/finetuning LLMs☆165Updated this week
- Observability Volume Management☆41Mar 19, 2025Updated last year
- ☆272Jun 25, 2025Updated 10 months ago
- Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.☆234Sep 19, 2023Updated 2 years ago
- Examples and guides for building Gen AI applications on the watsonx platform.☆46Mar 27, 2026Updated last month
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆45,153Updated this week