NVIDIA/nv-ingest

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA/nv-ingest)

NVIDIA / nv-ingest

NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.

☆2,851

Alternatives and similar repositories for nv-ingest

Users that are interested in nv-ingest are comparing it to the libraries listed below

Sorting:

fixie-ai / ultravox
View on GitHub
A fast multimodal LLM for real-time voice
☆4,367Dec 12, 2025Updated 2 months ago
docling-project / docling
View on GitHub
Get your documents ready for gen AI
☆54,754Updated this week
QuivrHQ / MegaParse
View on GitHub
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
☆7,342Feb 21, 2025Updated last year
stanford-oval / storm
View on GitHub
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
☆27,949Sep 30, 2025Updated 5 months ago
microsoft / PromptWizard
View on GitHub
Task-Aware Agent-driven Prompt Optimization Framework
☆3,805Oct 13, 2025Updated 4 months ago
Zipstack / unstract
View on GitHub
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
☆6,452Updated this week
Cinnamon / kotaemon
View on GitHub
An open-source RAG-based tool for chatting with your documents.
☆25,168Updated this week
getomni-ai / zerox
View on GitHub
OCR & Document Extraction using vision models
☆12,155May 20, 2025Updated 9 months ago
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆19,360Feb 24, 2026Updated last week
OpenSPG / KAG
View on GitHub
KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning a…
☆8,574Jan 28, 2026Updated last month
microsoft / data-formulator
View on GitHub
🪄 Create rich visualizations with AI
☆15,103Updated this week
allenai / olmocr
View on GitHub
Toolkit for linearizing PDFs for LLM datasets/training
☆16,947Feb 19, 2026Updated last week
microsoft / TinyTroupe
View on GitHub
LLM-powered multiagent persona simulation for imagination enhancement and business insights.
☆7,295Updated this week
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆32,519Updated this week
llamastack / llama-stack
View on GitHub
Composable building blocks to build LLM Apps
☆8,278Updated this week
agno-agi / agno
View on GitHub
Build, run, manage agentic software at scale.
☆38,276Updated this week
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆32,069Updated this week
Canner / WrenAI
View on GitHub
⚡️ GenBI (Generative BI) queries any database in natural language, generates accurate SQL (Text-to-SQL), charts (Text-to-Chart), and AI-p…
☆14,528Updated this week
Unstructured-IO / unstructured
View on GitHub
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…
☆14,074Updated this week
pydantic / pydantic-ai
View on GitHub
GenAI Agent Framework, the Pydantic way
☆15,120Updated this week
run-llama / llama_cloud_services
View on GitHub
Knowledge Agents and Management in the Cloud
☆4,235Feb 17, 2026Updated 2 weeks ago
openai / swarm
View on GitHub
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
☆21,026Mar 11, 2025Updated 11 months ago
getzep / graphiti
View on GitHub
Build Real-Time Knowledge Graphs for AI Agents
☆23,192Updated this week
BerriAI / litellm
View on GitHub
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…
☆37,083Updated this week
pingcap / autoflow
View on GitHub
pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tid…
☆2,738Jan 9, 2026Updated last month
mem0ai / mem0
View on GitHub
Universal memory layer for AI Agents
☆48,604Updated this week
browserbase / stagehand
View on GitHub
The AI Browser Automation Framework
☆21,261Updated this week
unclecode / crawl4ai
View on GitHub
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
☆61,332Updated this week
unslothai / unsloth
View on GitHub
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.
☆53,029Updated this week
huggingface / smolagents
View on GitHub
🤗 smolagents: a barebones library for agents that think in code.
☆25,615Feb 21, 2026Updated last week
microsoft / markitdown
View on GitHub
Python tool for converting files and office documents to Markdown.
☆88,637Feb 20, 2026Updated last week
khoj-ai / khoj
View on GitHub
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. …
☆32,752Feb 24, 2026Updated last week
microsoft / graphrag
View on GitHub
A modular graph-based Retrieval-Augmented Generation (RAG) system
☆31,162Updated this week
awslabs / agent-squad
View on GitHub
Flexible and powerful framework for managing multiple AI agents and handling complex conversations
☆7,472Feb 11, 2026Updated 3 weeks ago
Skyvern-AI / skyvern
View on GitHub
Automate browser based workflows with AI
☆20,629Updated this week
lumina-ai-inc / chunkr
View on GitHub
Vision infrastructure to turn complex documents into RAG/LLM-ready data
☆2,940Sep 24, 2025Updated 5 months ago
livekit / agents
View on GitHub
A framework for building realtime voice AI agents 🤖🎙️📹
☆9,441Updated this week
zaidmukaddam / scira
View on GitHub
Scira (Formerly MiniPerplx) is a minimalistic AI-powered search engine that helps you find information on the internet and cites it too. …
☆11,472Feb 10, 2026Updated 3 weeks ago
assafelovic / gpt-researcher
View on GitHub
An autonomous agent that conducts deep research on any data using any LLM providers.
☆25,472Feb 21, 2026Updated last week