butlerlabs / docai
DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning models for a wide range of applications
☆20Updated 2 years ago
Alternatives and similar repositories for docai:
Users that are interested in docai are comparing it to the libraries listed below
- ☆22Updated last year
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents☆37Updated last year
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆23Updated 6 months ago
- This repository serves as a collection of scrapers procuring and structuring various legal datasets☆17Updated last year
- Repository for deepdoctection tutorial notebooks☆44Updated 5 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆67Updated 3 weeks ago
- Visual similarity search engine demo with use of PyTorch Metric Learning and Qdrant☆12Updated 2 years ago
- ☆15Updated 3 years ago
- Trained BERT and Word2Vec legal clause classifiers for SPACY using the Atticus Project's Open Source Contract Label Corpus☆14Updated 4 years ago
- A chatbot made using the Chatterbot library in Python and locally hosted using Streamlit. Dataset used were collected during ConvAI2 comp…☆15Updated 3 years ago
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆13Updated 8 months ago
- Unstract's interface to LLMs, Embeddings and VectorDBs.☆18Updated 9 months ago
- AI_Powered_Dev_Search_Engine☆12Updated last year
- a streaming markdown component for streamlit with LaTeX, Mermaid, Table, code support. A drop-in replacement for st.markdown.☆16Updated 2 months ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆50Updated last month
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆11Updated 8 months ago
- Automated PDF and text processing with Spacy and NLTK; information extraction from text based on grammatical structure; deployed on extra…☆16Updated 3 years ago
- Docutron Toolkit: detection and segmentation analysis for legal data extraction over documents.☆25Updated last year
- Pandas-LLM☆42Updated last year
- 💙 Unstructured Data Connectors for Haystack 2.0☆16Updated last year
- python package to parse pdfs with different parsers☆35Updated 4 months ago
- Demos of some issues with LangChain.☆31Updated last year
- ☆11Updated last year
- An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate h…☆20Updated 9 months ago
- Solve Geometric & Graph Problems with Large Language Models☆29Updated 2 years ago
- Tools for merging pretrained large language models.☆19Updated 10 months ago
- Scripts for reading, extracting, and organizing data from either HTML or PDF documents and prepare them to be converted into embeddings f…☆13Updated 8 months ago
- AgentParse is a high-performance parsing library designed to map various structured data formats (such as Pydantic models, JSON, YAML, an…☆13Updated this week
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆13Updated last year
- Luann allows you to create a LLM agent,which has complete memory module (long-term memory, short-term memory) and knowledge module(Variou…☆21Updated last month