2dogsandanerd / smart-ingest-kitLinks
Stop using static chunk sizes. A lightweight, production-ready RAG ingestion toolkit. Uses Docling for layout-aware parsing and applies smart heuristics for optimal chunking (PDF vs Code vs MD). Extracted from a production RAG platform
☆64Updated last month
Alternatives and similar repositories for smart-ingest-kit
Users that are interested in smart-ingest-kit are comparing it to the libraries listed below
Sorting:
- VeritasGraph: Enterprise-Grade Graph RAG for Secure, On-Premise AI with Verifiable Attribution☆195Updated this week
- awesome-rag: a collection of awesome thing related to Retrieval-Augmented Generation☆175Updated 5 months ago
- One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.☆54Updated 3 weeks ago
- FACT – Fast Augmented Context Tools: FACT is a lean retrieval pattern that skips vector search. We cache every static token inside Claude…☆129Updated 5 months ago
- Turn AI into a persistent, memory-powered collaborator. Universal MCP Server (supports HTTP, STDIO, and WebSocket) enabling cross-platfor…☆243Updated 2 weeks ago
- A modern desktop application for exploring, managing, and analyzing vector databases☆94Updated last week
- An OpenSource Deep Research library with reasoning☆170Updated last month
- Cookbook for Pipelex, the declarative language for composable Al workflows. Devtool for agents and mere humans.☆36Updated 2 weeks ago
- Chat with your data - with memory, rules, and observability built in. Deploy in 2 minutes☆398Updated this week
- An open-source Text2SQL tool that transforms natural language into SQL using graph-powered schema understanding. Ask your database questi…☆294Updated this week
- AI-powered text compression library for RAG systems and API calls. Reduce token usage by up to 50-60% while preserving semantic meaning w…☆73Updated 4 months ago
- Shared Memory Storage for Multi-Agent Systems☆138Updated 6 months ago
- A simple CPU only OCR for pdf/images/word/excel to markdown. With streamlit.☆41Updated last month
- Multi-agent autonomous research system using LangGraph and LangChain. Generates citation-backed reports with credibility scoring and web …☆120Updated last week
- An Excel AI agent that uses MCP tools to let LLMs read, edit, and automate Excel spreadsheets.☆125Updated last month
- In the midst of all the tools out there that you can possibly use to keep track of them. Here's a "shovel" that just works to try them al…☆113Updated this week
- ☆161Updated 2 months ago
- Grapheteria: A structured framework bringing uniformity to agent orchestration!☆58Updated 6 months ago
- Deep research tool for local knowledge base.☆151Updated last month
- Your SDK solves all of this. One interface. Unified logic. Local + hosted models. Fine-tuning. Agent tools. Enterprise-ready. Hybrid RAG.…☆83Updated this week
- Office document creation and editing skills for Claude Code - PPTX, DOCX, XLSX, and PDF workflows with automation support☆171Updated 3 months ago
- ☆195Updated 5 months ago
- Workflows are an event-driven, async-first, step-based way to control the execution flow of AI applications like agents.☆302Updated this week
- The lightweight framework for building agents☆231Updated this week
- ☆210Updated 3 weeks ago
- The AI runtime that turns your framework functions into OpenAI compatible endpoints☆87Updated 10 months ago
- A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.☆97Updated 2 weeks ago
- ☆226Updated 8 months ago
- rudradb-opin-examples is for example implementations of the pip install rudradb-opin☆29Updated 4 months ago
- Gemini-cli or claude code? Why not both? LangCode combines all CLI capabilities and models in one place ☂️!☆435Updated last month