NVIDIA / nv-ingestLinks
NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
☆2,751Updated this week
Alternatives and similar repositories for nv-ingest
Users that are interested in nv-ingest are comparing it to the libraries listed below
Sorting:
- A system for agentic LLM-powered data processing and ETL☆3,001Updated last week
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,892Updated 3 weeks ago
- Build your own inference engine with expert control. Deploy agents, MCP servers, models, RAG, pipelines and more. No MLOps. No YAML.☆3,591Updated this week
- Knowledge Agents and Management in the Cloud☆4,182Updated this week
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆1,328Updated 5 months ago
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,198Updated 8 months ago
- RAG that intelligently adapts to your use case, data, and queries☆3,550Updated 4 months ago
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,437Updated last month
- Improved file parsing for LLM’s☆3,112Updated 11 months ago
- Fast State-of-the-Art Static Embeddings☆1,863Updated last week
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tid…☆2,671Updated this week
- ETL, Analytics, Versioning for Unstructured Data☆2,686Updated this week
- Detect and extract tables to markdown and csv☆752Updated 8 months ago
- A toolkit to create optimal Production-readyRetrieval Augmented Generation(RAG) setup for your data☆1,509Updated 5 months ago
- AdalFlow: The library to build & auto-optimize LLM applications.☆3,827Updated last week
- High-performance retrieval engine for unstructured data☆1,507Updated 2 months ago
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents☆5,860Updated this week
- ☆2,040Updated 7 months ago
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry☆4,258Updated last month
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,900Updated last month
- The python library for real-time communication☆4,343Updated last month
- Document to Markdown OCR library with Llama 3.2 vision☆2,410Updated 9 months ago
- The open LLM Ops platform - Traces, Analytics, Evaluations, Datasets and Prompt Optimization ✨☆2,551Updated last week
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,248Updated 2 weeks ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,760Updated 7 months ago
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL☆1,089Updated last week
- The most accurate document search and store for building AI apps☆3,309Updated last week
- open-source framework for creating and managing simulations populated with AI-powered agents. It provides an intuitive platform for desig…☆935Updated 8 months ago
- Python library for Agentic Document Extraction from LandingAI☆2,109Updated last week
- OCR Benchmark☆575Updated 4 months ago