Layout-Parser / layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
☆4,793Updated last month
Related projects: ⓘ
- Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…☆2,193Updated 2 months ago
- docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.☆3,598Updated 3 weeks ago
- This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table …☆1,484Updated 3 years ago
- A curated list of resources for Document Understanding (DU) topic☆1,258Updated last year
- ☆902Updated 2 years ago
- A Repo For Document AI☆2,500Updated this week
- Transforms PDF, Documents and Images into Enriched Structured Data☆5,758Updated 9 months ago
- OpenMMLab Text Detection, Recognition and Understanding Toolbox☆4,286Updated 2 months ago
- Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022☆5,707Updated 2 months ago
- 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows☆8,698Updated last week
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆6,369Updated 3 weeks ago
- AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file conver…☆16,739Updated this week
- Top2Vec learns jointly embedded topic, document and word vectors.☆2,919Updated 4 months ago
- A machine learning software for extracting information from scholarly documents☆3,433Updated this week
- Document Layout Analysis resources repos for development with PdfPig.☆571Updated 11 months ago
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities☆19,545Updated 3 weeks ago
- Rapid fuzzy string matching in Python using various string metrics☆2,614Updated this week
- 📄 🤖 Semantic search and workflows for medical/scientific papers☆1,271Updated 9 months ago
- The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.☆9,420Updated last week
- A Python library to extract tabular data from PDFs☆2,919Updated last month
- Text preprocessing, representation and visualization from zero to hero.☆2,880Updated last year
- State-of-the-Art Text Embeddings☆14,861Updated this week
- Leveraging BERT and c-TF-IDF to create easily interpretable topics.☆5,991Updated 3 weeks ago
- 💥 Fast State-of-the-Art Tokenizers optimized for Research and Production☆8,888Updated this week
- Efficient few-shot learning with Sentence Transformers☆2,143Updated this week
- DocBank: A Benchmark Dataset for Document Layout Analysis☆558Updated last month
- A data augmentations library for audio, image, text, and video.☆4,940Updated last week
- A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.☆1,386Updated last month
- Label Studio is a multi-type data labeling and annotation tool with standardized output format☆18,242Updated this week
- An open-source, low-code machine learning library in Python☆8,827Updated 2 weeks ago