NanoNets / ocr-pythonLinks
OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
☆119Updated 2 years ago
Alternatives and similar repositories for ocr-python
Users that are interested in ocr-python are comparing it to the libraries listed below
Sorting:
- Collection of PDF parsing libraries like AI based docling, claude, openai, gemini, meta's llama-vision, unstructured-io, and pdfminer, py…☆149Updated last month
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆29Updated 2 years ago
- ☆71Updated 10 months ago
- Multimodal document parser for high quality data understanding and extraction☆82Updated last week
- ☆55Updated 4 months ago
- Using GPT-4 Vision and GPT-4 Turbo, take a PDF as input and get a markdown file as output.☆96Updated 8 months ago
- Data extraction with LLM on CPU☆267Updated last year
- Streamly - Streamlit Assistant is designed to provide the latest updates from Streamlit, generate code snippets for Streamlit widgets, an…☆102Updated last year
- Open-source RAG evaluation through users' feedback☆205Updated last year
- ☆124Updated 7 months ago
- PDF intelligence platform combining IBM Docling for document processing, LlamaIndex for data structuring, and Streamlit for a powerful UI…☆51Updated 9 months ago
- like firecrawl.dev but free☆49Updated 7 months ago
- Groq goes brrrrr... so had to make a basic Streamlit app you can build upon!☆83Updated 9 months ago
- A Chat App built with embedchain and streamlit☆41Updated 2 years ago
- A simple NPM interface for seamlessly interacting with 36 Large Language Model (LLM) providers, including OpenAI, Anthropic, Google Gemin…☆114Updated last week
- Simple package to extract text with coordinates from programmatic PDFs☆202Updated last month
- Examples of integrating the OpenRouter API☆233Updated 6 months ago
- ☆46Updated last year
- Convert Word documents to beautiful Markdown. Via command line or in your browser.☆148Updated this week
- A set of re-usable AI agent for document processing☆96Updated 9 months ago
- A package for visualising Chroma vector collections in 3D☆108Updated last year
- This repository demonstrates how to leverage OpenAI's GPT-4 models with JSON Strict Mode to extract structured data from web pages. It c…☆17Updated last year
- LLM Siri with OpenAI, Perplexity, Ollama, Llama2, Mistral, Mixtral & Langchain☆60Updated last year
- Reliable RAG setup that uses Semantic Double Merging Chunking from llamaindex, Qdrant Hybrid Search, colBERT for reranking and Google Gem…☆41Updated 10 months ago
- react + next.js dashboard for R2R: The most advanced AI retrieval system. Containerized, Retrieval-Augmented Generation (RAG) with a REST…☆167Updated 5 months ago
- MCP server for Hugging Face dataset viewer☆29Updated 5 months ago
- PDF text data extraction web app with OCR for scanned documents☆90Updated last year
- WhisperAnywhere: Effortless speech-to-text everywhere on your Mac. Use a hotkey to dictate in any app, powered by Whisper AI and Groq API…☆34Updated last year
- A project that enables identification and classification of an intent of a message with dynamic labels☆44Updated 10 months ago
- Chat with your Documents(PDF, TXT, DOCX, ODT, PPTX etc), Websites and Youtube Chat too!, CSV files. Uses langchain, Ollama, Groq, Gemini,…☆55Updated last year