Stop using static chunk sizes. A lightweight, production-ready RAG ingestion toolkit. Uses Docling for layout-aware parsing and applies smart heuristics for optimal chunking (PDF vs Code vs MD). Extracted from a production RAG platform
β69Mar 15, 2026Updated 2 months ago
Alternatives and similar repositories for smart-ingest-kit
Users that are interested in smart-ingest-kit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Self-Extensible Multi Agent Assistant πβ58Feb 13, 2026Updated 3 months ago
- FastAPI + MLX offline-first voice agent with <1s latency. Minimal UIβ56Oct 21, 2025Updated 7 months ago
- A Docker-powered RAG system that understands the difference between code and prose. Ingest your codebase and documentation, then query thβ¦β256Mar 14, 2026Updated 2 months ago
- A simple CPU only OCR for pdf/images/word/excel to markdown. With streamlit.β50Jan 26, 2026Updated 4 months ago
- BrainAPI is a knowledge graphβpowered AI memory layer that transforms unstructured data into structured knowledge, enabling intelligent sβ¦β125Updated this week
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- β52Nov 18, 2025Updated 6 months ago
- A Python-native Terminal-Based Git Client - Navigate and manage your Git repositories with a beautiful TUI interface inspired by LazyGit.β36Feb 7, 2026Updated 4 months ago
- β66Updated this week
- A Paperless-ngx consume script that leverages Docling to provide superior OCR and layout analysis for PDFs, Office documents, and images.β16Dec 7, 2025Updated 6 months ago
- A modern desktop application for exploring, managing, and analyzing vector databasesβ241May 29, 2026Updated last week
- Professional RAG development skills for Claude Code - audit, evaluate, optimize, and scaffold RAG pipelinesβ32Jan 18, 2026Updated 4 months ago
- A framework for creating message-driven training systems with PyTorchβ21Oct 7, 2025Updated 8 months ago
- a high-quality, GPU-accelerated image resizerβ15Mar 6, 2025Updated last year
- Open Source Public Repo of Microsoft Data & AI Platformβ35Nov 10, 2025Updated 6 months ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A collection of pipelines for Scrapyβ16Apr 27, 2026Updated last month
- β39Nov 17, 2025Updated 6 months ago
- β27Aug 16, 2025Updated 9 months ago
- β126May 17, 2026Updated 3 weeks ago
- PRIMAVERA Extensibility Essentialsβ16Nov 16, 2022Updated 3 years ago
- [H] HyperspaceDB is a high-performance, vector database. It features 1-bit quantization, async replication, and native support for hierarβ¦β115Apr 20, 2026Updated last month
- A library for structural-semantic chunking of documents.β13Oct 8, 2025Updated 8 months ago
- Session-Driven Development - Maintain perfect context across AI coding sessions with Claude Codeβ61Jan 16, 2026Updated 4 months ago
- Coordinate skills between Codex, Copilot, and Claude Code. Validates, analyzes, and syncs skills, subagents, commands, and configuration β¦β67May 31, 2026Updated last week
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- β47May 19, 2025Updated last year
- Pluggable sample-level metadata versioning for incremental multimodal pipelines.β106Updated this week
- Linear Algebra library in GDScript for Godot Engineβ14Aug 10, 2020Updated 5 years ago
- InfraMind: Fine-tuning toolkit for training SLMs on Infrastructure-as-Code using GRPO/DAPO. Achieves 97.3% accuracy on IaC generation.β68Dec 15, 2025Updated 5 months ago
- Unofficial Logsnag client for Elixirβ13May 11, 2025Updated last year
- A little tool to write GitHub actions in Elixirβ12Mar 6, 2026Updated 3 months ago
- Local CLI tool that lets you write natural language instructions and get the corresponding shell commands generated by a small language mβ¦β21Nov 18, 2025Updated 6 months ago
- Cross-machine AI agent communication, plus a mobile app to control any terminal on your machine.β135Updated this week
- A script for Adobe Photoshop that randomly perturbs the font attributes of a text layer for each character in the layerβ23Apr 30, 2020Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Scripts to automatically sync Claude Code generated TODO to TaskWarriorβ17Jun 22, 2025Updated 11 months ago
- Deep research agentic system using Time Test Diffusionβ45Dec 11, 2025Updated 5 months ago
- An end-to-end ES/CQRS example with EventStoreDB and Elixirβ12Jun 14, 2024Updated last year
- β12Jan 15, 2024Updated 2 years ago
- Production-ready Python library for multi-provider LLM orchestrationβ41Updated this week
- Ivar is an adapter based HTTP client that provides the ability to build composable HTTP requests.β17Oct 5, 2017Updated 8 years ago
- Tree-based, vectorless document RAG framework. Connect any LLM via URL/API key.β38Apr 7, 2026Updated 2 months ago