NVIDIA / nv-ingestLinks
NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
☆2,763Updated this week
Alternatives and similar repositories for nv-ingest
Users that are interested in nv-ingest are comparing it to the libraries listed below
Sorting:
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,910Updated 2 months ago
- A system for agentic LLM-powered data processing and ETL☆3,101Updated last week
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,234Updated 9 months ago
- Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.☆3,715Updated this week
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆1,387Updated 6 months ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,335Updated 2 weeks ago
- RAG that intelligently adapts to your use case, data, and queries☆3,600Updated 3 weeks ago
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,456Updated 3 months ago
- Improved file parsing for LLM’s☆3,134Updated last year
- Knowledge Agents and Management in the Cloud☆4,205Updated this week
- A toolkit to create optimal Production-readyRetrieval Augmented Generation(RAG) setup for your data☆1,516Updated 6 months ago
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tid…☆2,684Updated last month
- Document to Markdown OCR library with Llama 3.2 vision☆2,420Updated 10 months ago
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL☆1,109Updated 3 weeks ago
- Fast State-of-the-Art Static Embeddings☆1,907Updated 2 weeks ago
- open-source framework for creating and managing simulations populated with AI-powered agents. It provides an intuitive platform for desig…☆933Updated 9 months ago
- 🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines☆3,244Updated 2 weeks ago
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,940Updated 2 months ago
- The open LLM Ops platform - Traces, Analytics, Evaluations, Datasets and Prompt Optimization ✨☆2,631Updated this week
- AdalFlow: The library to build & auto-optimize LLM applications.☆3,881Updated last month
- The most accurate document search and store for building AI apps☆3,383Updated this week
- High-performance retrieval engine for unstructured data☆1,531Updated 2 weeks ago
- 🦾 Take control of your AI agents☆1,384Updated 3 months ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,779Updated 9 months ago
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents☆5,963Updated this week
- The data plane for agents. Arch is a models-native proxy server that handles the plumbing work in AI: agent routing & hand off, guardrail…☆4,386Updated last week
- AI-Powered Data Processing: Use LOTUS to process all of your datasets with LLMs and embeddings. Enjoy up to 1000x speedups with fast, acc…☆1,355Updated last week
- This repository contains various advanced techniques for Retrieval-Augmented Generation (RAG) systems.☆2,338Updated 9 months ago
- Full toolkit for running an AI agent service built with LangGraph, FastAPI and Streamlit☆3,870Updated this week
- Task-Aware Agent-driven Prompt Optimization Framework☆3,693Updated last month