Open source project for data preparation for GenAI applications
☆915Mar 13, 2026Updated 3 weeks ago
Alternatives and similar repositories for data-prep-kit
Users that are interested in data-prep-kit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Build document-native LLM applications☆57Sep 11, 2024Updated last year
- A web app for rapidly prototyping AI agents and the lightweight web UIs that wrap them—build flows, preview interactions, and share agent…☆63Mar 19, 2026Updated 3 weeks ago
- Docling core data types and transformations☆240Apr 1, 2026Updated last week
- Build production-ready AI agents in both Python and Typescript.☆3,201Mar 27, 2026Updated 2 weeks ago
- Granite Snack Cookbook -- easily consumable recipes (python notebooks) that showcase the capabilities of the Granite models☆359Mar 20, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- The official Python SDK for Codellm-Devkit☆16Mar 5, 2026Updated last month
- Docling Haystack integration☆29Jan 13, 2025Updated last year
- InstructLab Core package. Use this to chat with a model and execute the InstructLab workflow to train a model using custom taxonomy data…☆1,414Mar 30, 2026Updated last week
- Get your documents ready for gen AI☆57,163Updated this week
- Taxonomy tree that will allow you to create models tuned with your data☆294Sep 8, 2025Updated 7 months ago
- ☆198Updated this week
- Granite Code Models: A Family of Open Foundation Models for Code Intelligence☆1,250Jun 25, 2025Updated 9 months ago
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆56Mar 30, 2026Updated last week
- The living Trust and Safety User Guide for the AI Alliance (https://thealliance.ai)☆15Feb 15, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,978Apr 2, 2026Updated last week
- Estimate resources needed to train LLMs☆14Feb 10, 2026Updated last month
- Deploy, and share agents with open infrastructure, free from vendor lock-in.☆1,054Apr 3, 2026Updated last week
- Evaluation framework for document processing models and services.☆67Apr 2, 2026Updated last week
- Scalable data pre processing and curation toolkit for LLMs☆1,520Updated this week
- A system for agentic LLM-powered data processing and ETL☆3,702Mar 27, 2026Updated last week
- Automation for IBM Watson Deployments☆17Sep 17, 2025Updated 6 months ago
- Interact with the Deep Search platform for new knowledge explorations and discoveries☆222Jan 24, 2025Updated last year
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆211Feb 16, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Community maintained hardware plugin for vLLM on Spyre☆50Apr 2, 2026Updated last week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,155Mar 30, 2026Updated last week
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆23Mar 12, 2024Updated 2 years ago
- Running Docling as an API service☆1,398Updated this week
- Fybrik☆132Sep 7, 2025Updated 7 months ago
- Simple package to extract text with coordinates from programmatic PDFs☆262Updated this week
- FMS Model Optimizer is a framework for developing reduced precision neural network models.☆21Mar 30, 2026Updated last week
- Transform unstructured documents into validated, rich and queryable knowledge graphs.☆123Updated this week
- Chat with your website using LLMs☆79Feb 19, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,383Apr 3, 2026Updated last week
- Making docling agentic through MCP☆555Updated this week
- Docling LangChain integration☆67Nov 17, 2025Updated 4 months ago
- Examples and guides for building Gen AI applications on the watsonx platform.☆45Mar 27, 2026Updated last week
- LM engine is a library for pretraining/finetuning LLMs☆163Updated this week
- ☆271Jun 25, 2025Updated 9 months ago
- Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.☆234Sep 19, 2023Updated 2 years ago