NVIDIA / nv-ingestLinks
NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
☆2,803Updated this week
Alternatives and similar repositories for nv-ingest
Users that are interested in nv-ingest are comparing it to the libraries listed below
Sorting:
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,929Updated 3 months ago
- A system for agentic LLM-powered data processing and ETL☆3,403Updated 2 weeks ago
- A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.☆3,766Updated last week
- Knowledge Agents and Management in the Cloud☆4,225Updated this week
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,475Updated 4 months ago
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆1,413Updated 8 months ago
- Fast State-of-the-Art Static Embeddings☆1,982Updated 2 weeks ago
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,628Updated last week
- RAG that intelligently adapts to your use case, data, and queries☆3,671Updated 2 months ago
- Improved file parsing for LLM’s☆3,150Updated last year
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry☆4,312Updated last month
- The most accurate document search and store for building AI apps☆3,442Updated last week
- A toolkit to create optimal Production-readyRetrieval Augmented Generation(RAG) setup for your data☆1,520Updated 7 months ago
- High-performance retrieval engine for unstructured data☆1,549Updated 2 months ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,819Updated this week
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tid…☆2,726Updated last week
- Deploy your agentic worfklows to production☆2,070Updated last month
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,442Updated last week
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,253Updated 10 months ago
- 🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines☆3,574Updated this week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,820Updated 8 months ago
- Developer APIs to Accelerate LLM Projects☆1,742Updated last year
- open-source framework for creating and managing simulations populated with AI-powered agents. It provides an intuitive platform for desig…☆932Updated 11 months ago
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,966Updated last month
- AI-Powered Data Processing: Use LOTUS to process all of your datasets with LLMs and embeddings. Enjoy up to 1000x speedups with fast, acc…☆1,525Updated last week
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL☆1,132Updated this week
- ContextGem: Effortless LLM extraction from documents☆1,755Updated 3 weeks ago
- AdalFlow: The library to build & auto-optimize LLM applications.☆3,990Updated 2 weeks ago
- Analytics, Versioning and ETL for multimodal data: video, audio, PDFs, images☆2,717Updated this week
- The open LLM Ops platform - Traces, Analytics, Evaluations, Datasets and Prompt Optimization ✨☆2,720Updated this week