Open source project for data preparation for GenAI applications
☆946Jun 22, 2026Updated last week
Alternatives and similar repositories for data-prep-kit
Users that are interested in data-prep-kit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Build document-native LLM applications☆58Sep 11, 2024Updated last year
- A web app for rapidly prototyping AI agents and the lightweight web UIs that wrap them—build flows, preview interactions, and share agent…☆66Jun 1, 2026Updated 3 weeks ago
- Docling core data types and transformations☆261Updated this week
- Build production-ready AI agents in both Python and Typescript.☆3,307Updated this week
- Granite Snack Cookbook -- easily consumable recipes (python notebooks) that showcase the capabilities of the Granite models☆382Updated this week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- The official Python SDK for Codellm-Devkit☆20Updated this week
- Docling Haystack integration☆29Apr 9, 2026Updated 2 months ago
- InstructLab Core package. Use this to chat with a model and execute the InstructLab workflow to train a model using custom taxonomy data…☆1,418Mar 30, 2026Updated 2 months ago
- Get your documents ready for gen AI☆62,000Updated this week
- Taxonomy tree that will allow you to create models tuned with your data☆299Sep 8, 2025Updated 9 months ago
- Granite Code Models: A Family of Open Foundation Models for Code Intelligence☆1,250Jun 25, 2025Updated last year
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆58Apr 22, 2026Updated 2 months ago
- ☆207Jun 4, 2026Updated 3 weeks ago
- The living Trust and Safety User Guide for the AI Alliance (https://thealliance.ai)☆16Jun 12, 2026Updated 2 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆3,119May 26, 2026Updated last month
- Estimate resources needed to train LLMs☆14Feb 10, 2026Updated 4 months ago
- Deploy, and share agents with open infrastructure, free from vendor lock-in.☆1,128Jun 8, 2026Updated 3 weeks ago
- Evaluation framework for document processing models and services.☆76May 28, 2026Updated last month
- Scalable data pre processing and curation toolkit for LLMs☆1,633Updated this week
- A system for agentic LLM-powered data processing and ETL☆3,841Jun 17, 2026Updated last week
- Automation for IBM Watson Deployments☆17Sep 17, 2025Updated 9 months ago
- Interact with the Deep Search platform for new knowledge explorations and discoveries☆226Jan 24, 2025Updated last year
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆214May 27, 2026Updated last month
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Community maintained hardware plugin for vLLM on Spyre☆52Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,300Jun 22, 2026Updated last week
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆23Mar 12, 2024Updated 2 years ago
- Fybrik☆132Sep 7, 2025Updated 9 months ago
- Running Docling as an API service☆1,621Updated this week
- Simple package to extract text with coordinates from programmatic PDFs☆316Updated this week
- FMS Model Optimizer is a framework for developing reduced precision neural network models.☆21Updated this week
- Chat with your website using LLMs☆80May 21, 2026Updated last month
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆15,002Updated this week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Docling LangChain integration☆72Nov 17, 2025Updated 7 months ago
- Transform unstructured documents into validated, rich and queryable knowledge graphs.☆167Updated this week
- Observability Volume Management☆41Mar 19, 2025Updated last year
- ☆270Jun 25, 2025Updated last year
- Making docling agentic through MCP☆665Jun 15, 2026Updated 2 weeks ago
- LM engine is a library for pretraining/finetuning LLMs☆181Updated this week
- Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.☆236Sep 19, 2023Updated 2 years ago