h2oai / doctr
docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
☆12Updated 2 months ago
Alternatives and similar repositories for doctr:
Users that are interested in doctr are comparing it to the libraries listed below
- Docutron Toolkit: detection and segmentation analysis for legal data extraction over documents.☆25Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆64Updated 4 months ago
- An integration of Qdrant ANN vector database backend with txtai☆24Updated 6 months ago
- 🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform☆37Updated last year
- This repository serves as a collection of scrapers procuring and structuring various legal datasets☆17Updated last year
- 💙 Unstructured Data Connectors for Haystack 2.0☆16Updated last year
- Dataset Viber is your chill repo for data collection, annotation and vibe checks.☆45Updated 5 months ago
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆32Updated 6 months ago
- Using short models to classify long texts☆21Updated last year
- Generalist and Lightweight Model for Text Classification☆87Updated this week
- NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, …☆78Updated 3 months ago
- ☆20Updated 10 months ago
- This project is a versatile and powerful search tool that leverages state-of-the-art natural language processing models to provide releva…☆12Updated last year
- Pandas-LLM☆37Updated last year
- We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datas…☆78Updated 2 years ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆76Updated last year
- Speak (speech-to-text) to LLMs (Ollama) in any lanaguage - Streamlit app☆40Updated last year
- ☆17Updated 9 months ago
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆58Updated 2 months ago
- ☆45Updated 5 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆63Updated 5 months ago
- ☆13Updated last year
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆37Updated 2 years ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Updated 4 months ago
- Efficient few-shot learning with cross-encoders.☆49Updated last year
- ☆11Updated last year
- ☆22Updated 11 months ago
- Data extraction with LLM on CPU☆67Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆34Updated 2 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 7 months ago