Open source project for data preparation for GenAI applications
☆934May 15, 2026Updated 3 weeks ago
Alternatives and similar repositories for data-prep-kit
Users that are interested in data-prep-kit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Build document-native LLM applications☆58Sep 11, 2024Updated last year
- A web app for rapidly prototyping AI agents and the lightweight web UIs that wrap them—build flows, preview interactions, and share agent…☆66Jun 1, 2026Updated last week
- Docling core data types and transformations☆259Updated this week
- Build production-ready AI agents in both Python and Typescript.☆3,278May 28, 2026Updated last week
- Granite Snack Cookbook -- easily consumable recipes (python notebooks) that showcase the capabilities of the Granite models☆382May 30, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The official Python SDK for Codellm-Devkit☆18May 18, 2026Updated 3 weeks ago
- Docling Haystack integration☆29Apr 9, 2026Updated 2 months ago
- InstructLab Core package. Use this to chat with a model and execute the InstructLab workflow to train a model using custom taxonomy data…☆1,415Mar 30, 2026Updated 2 months ago
- Get your documents ready for gen AI☆60,897Updated this week
- Taxonomy tree that will allow you to create models tuned with your data☆298Sep 8, 2025Updated 9 months ago
- Granite Code Models: A Family of Open Foundation Models for Code Intelligence☆1,249Jun 25, 2025Updated 11 months ago
- ☆206Updated this week
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆58Apr 22, 2026Updated last month
- The living Trust and Safety User Guide for the AI Alliance (https://thealliance.ai)☆16May 21, 2026Updated 2 weeks ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆3,077May 26, 2026Updated 2 weeks ago
- Estimate resources needed to train LLMs☆14Feb 10, 2026Updated 3 months ago
- Deploy, and share agents with open infrastructure, free from vendor lock-in.☆1,114Updated this week
- Evaluation framework for document processing models and services.☆75May 28, 2026Updated last week
- Scalable data pre processing and curation toolkit for LLMs☆1,601Updated this week
- A system for agentic LLM-powered data processing and ETL☆3,757Updated this week
- Automation for IBM Watson Deployments☆17Sep 17, 2025Updated 8 months ago
- Interact with the Deep Search platform for new knowledge explorations and discoveries☆226Jan 24, 2025Updated last year
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆213May 27, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Community maintained hardware plugin for vLLM on Spyre☆52Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,240Jun 1, 2026Updated last week
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆23Mar 12, 2024Updated 2 years ago
- Fybrik☆132Sep 7, 2025Updated 9 months ago
- Running Docling as an API service☆1,575Updated this week
- Simple package to extract text with coordinates from programmatic PDFs☆285Jun 1, 2026Updated last week
- FMS Model Optimizer is a framework for developing reduced precision neural network models.☆21May 28, 2026Updated last week
- Chat with your website using LLMs☆80May 21, 2026Updated 2 weeks ago
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,841Updated this week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Docling LangChain integration☆72Nov 17, 2025Updated 6 months ago
- Observability Volume Management☆41Mar 19, 2025Updated last year
- ☆271Jun 25, 2025Updated 11 months ago
- Making docling agentic through MCP☆644May 20, 2026Updated 2 weeks ago
- Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.☆237Sep 19, 2023Updated 2 years ago
- LM engine is a library for pretraining/finetuning LLMs☆174Updated this week
- Examples and guides for building Gen AI applications on the watsonx platform.☆48Updated this week