butlerlabs / docaiLinks
DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning models for a wide range of applications
☆20Updated 2 years ago
Alternatives and similar repositories for docai
Users that are interested in docai are comparing it to the libraries listed below
Sorting:
- ☆22Updated last year
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆72Updated this week
- CRUD Word documents with Python☆11Updated 9 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆128Updated last year
- Repository for deepdoctection tutorial notebooks☆46Updated 2 months ago
- Trained BERT and Word2Vec legal clause classifiers for SPACY using the Atticus Project's Open Source Contract Label Corpus☆14Updated 4 years ago
- ☆13Updated last year
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆22Updated 11 months ago
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents☆37Updated last year
- Universal text classifier for generative models☆24Updated last year
- ☆95Updated 5 years ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆109Updated last year
- High-Performance Transformers for Table Structure Recognition Need Early Convolutions☆45Updated last year
- ☆49Updated 11 months ago
- Visual similarity search engine demo with use of PyTorch Metric Learning and Qdrant☆12Updated 2 years ago
- Explore the use of DSPy for extracting features from PDFs 🔎☆46Updated last year
- GLiNER model in a FastAPI microservice.☆45Updated 8 months ago
- Pandas-LLM☆46Updated 2 years ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated 10 months ago
- ☆11Updated 9 months ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Updated 4 years ago
- ☆19Updated 4 months ago
- Nougat is a Meta AI's revolutionary OCR model designed to transcribe scientific PDFs into an easy-to-use Markdown format.☆24Updated last year
- Unstract's interface to LLMs, Embeddings and VectorDBs.☆18Updated last year
- Convert any image into a Region Adjacency Graph (RAG)☆12Updated 5 years ago
- 🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.☆50Updated this week
- ☆17Updated 4 years ago
- a streaming markdown component for streamlit with LaTeX, Mermaid, Table, code support. A drop-in replacement for st.markdown.☆24Updated 6 months ago
- A chatbot made using the Chatterbot library in Python and locally hosted using Streamlit. Dataset used were collected during ConvAI2 comp…☆15Updated 4 years ago
- Code for the EMNLP'24 paper "Learning to Extract Structured Entities Using Language Models"☆42Updated 4 months ago