microsoft / table-transformer
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
☆2,193Updated 2 months ago
Related projects: ⓘ
- A Repo For Document AI☆2,492Updated 3 weeks ago
- Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022☆5,707Updated 2 months ago
- A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team…☆1,352Updated last week
- A curated list of resources for Document Understanding (DU) topic☆1,258Updated last year
- LLM(😽)☆1,602Updated this week
- Developer APIs to Accelerate LLM Projects☆1,329Updated last month
- Improved file parsing for LLM’s☆2,361Updated this week
- UniTable: Towards a Unified Table Foundation Model☆338Updated 3 months ago
- ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)☆2,878Updated 2 weeks ago
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆1,318Updated last week
- ☆314Updated 8 months ago
- MTEB: Massive Text Embedding Benchmark☆1,798Updated this week
- [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings☆1,835Updated 3 weeks ago
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆8,446Updated this week
- docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.☆3,580Updated 3 weeks ago
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆2,817Updated 2 weeks ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024☆1,256Updated this week
- An easy way to extract information from documents☆1,693Updated last year
- A blazing fast inference solution for text embeddings models☆2,599Updated this week
- Efficient Retrieval Augmentation and Generation Framework☆1,255Updated last week
- ☆900Updated 2 years ago
- General technology for enabling AI capabilities w/ LLMs and MLLMs☆3,561Updated this week
- Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines☆6,560Updated this week
- Document Layout Analysis resources repos for development with PdfPig.☆571Updated 11 months ago
- DocBank: A Benchmark Dataset for Document Layout Analysis☆558Updated last month
- Parse files for optimal RAG☆2,450Updated this week
- A Unified Toolkit for Deep Learning Based Document Image Analysis☆4,783Updated last month
- Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'☆1,130Updated 2 weeks ago
- 🦙 Integrating LLMs into structured NLP pipelines☆1,072Updated last month
- Efficient few-shot learning with Sentence Transformers☆2,138Updated last week