explosion / spacy-layout
š Process PDFs, Word documents and more with spaCy
ā466Updated this week
Alternatives and similar repositories for spacy-layout:
Users that are interested in spacy-layout are comparing it to the libraries listed below
- A spaCy wrapper for GliNERā108Updated last month
- Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)ā397Updated 5 months ago
- Fast Semantic Text Deduplicationā567Updated last week
- š¦¦ weasel: A small and easy workflow systemā75Updated 8 months ago
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)ā190Updated 2 months ago
- SpanMarker for Named Entity Recognitionā421Updated 2 months ago
- Running Docling as an API serviceā140Updated this week
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.ā259Updated this week
- š¦ Integrating LLMs into structured NLP pipelinesā1,210Updated 2 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.ā750Updated last month
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. šØš»āš³ā260Updated 2 months ago
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDFā810Updated this week
- ā212Updated 3 months ago
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.ā¦ā258Updated 5 months ago
- ā118Updated 2 weeks ago
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.ā1,119Updated last week
- Simple package to extract text with coordinates from programmatic PDFsā77Updated this week
- A Lightweight Library for AI Observabilityā236Updated 3 weeks ago
- ā174Updated last week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.ā1,327Updated 2 weeks ago
- Extract structured text from pdfs quicklyā433Updated last week
- Chat with PDF files with source highlightsā128Updated 3 months ago
- Fast State-of-the-Art Static Embeddingsā1,092Updated last week
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024ā1,852Updated 3 weeks ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.ā410Updated last year
- š©š»āš³ A collection of example notebooksā440Updated this week
- Efficiently find the best-suited language model (LM) for your NLP taskā119Updated last week
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-sā¦ā214Updated last month
- A python library to define and validate data types in Docling.ā79Updated this week