NVIDIA / nv-ingestLinks
NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other enterprise documents into metadata and text to embed into retrieval systems.
☆2,701Updated this week
Alternatives and similar repositories for nv-ingest
Users that are interested in nv-ingest are comparing it to the libraries listed below
Sorting:
- A system for agentic LLM-powered data processing and ETL☆2,354Updated this week
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,282Updated 2 weeks ago
- The open LLM Ops platform - Traces, Analytics, Evaluations, Datasets and Prompt Optimization ✨☆2,186Updated this week
- Knowledge Agents and Management in the Cloud☆4,046Updated this week
- Fast State-of-the-Art Static Embeddings☆1,756Updated this week
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,298Updated last month
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆6,560Updated 4 months ago
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆1,147Updated 2 months ago
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tid…☆2,608Updated this week
- RAG that intelligently adapts to your use case, data, and queries☆3,372Updated 3 weeks ago
- The easiest way to deploy agents, MCP servers, models, RAG, pipelines and more. No MLOps. No YAML.☆3,382Updated this week
- 🦛 CHONK your texts with Chonkie ✨ — The no-nonsense RAG chunking library☆1,725Updated this week
- LOTUS: A semantic query engine for fast and easy LLM-powered data processing☆1,240Updated this week
- Task-Aware Agent-driven Prompt Optimization Framework☆3,383Updated this week
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,009Updated this week
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,207Updated this week
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL☆1,033Updated last month
- Open Source Application for Advanced LLM + Diffusion Engineering: interact, train, fine-tune, and evaluate large language models on your …☆3,593Updated this week
- The edge and AI gateway for agentic apps. Arch handles the messy low-level work in building agents like applying guardrails, routing prom…☆3,135Updated this week
- A toolkit to create optimal Production-readyRetrieval Augmented Generation(RAG) setup for your data☆1,445Updated last month
- Cache-Augmented Generation: A Simple, Efficient Alternative to RAG☆1,335Updated last month
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,625Updated 2 months ago
- Improved file parsing for LLM’s☆3,016Updated 8 months ago
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents☆5,445Updated this week
- Open source multi-modal RAG for building AI apps over private knowledge.☆2,784Updated this week
- The python library for real-time communication☆4,115Updated last week
- ETL, Analytics, Versioning for Unstructured Data☆2,606Updated this week
- open-source framework for creating and managing simulations populated with AI-powered agents. It provides an intuitive platform for desig…☆923Updated 5 months ago
- High-performance retrieval engine for unstructured data☆1,444Updated last week
- Document to Markdown OCR library with Llama 3.2 vision☆2,360Updated 5 months ago