explosion / spacy-layoutLinks
📚 Process PDFs, Word documents and more with spaCy
☆803Updated 8 months ago
Alternatives and similar repositories for spacy-layout
Users that are interested in spacy-layout are comparing it to the libraries listed below
Sorting:
- 🦙 Integrating LLMs into structured NLP pipelines☆1,344Updated 10 months ago
- Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)☆472Updated 3 months ago
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)☆246Updated 5 months ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024☆2,518Updated 2 weeks ago
- SpanMarker for Named Entity Recognition☆463Updated 10 months ago
- A spaCy wrapper for GliNER☆124Updated 9 months ago
- A python library to define and validate data types in Docling.☆204Updated this week
- ☆166Updated last week
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆424Updated 3 weeks ago
- Fast Semantic Text Deduplication & Filtering☆844Updated 3 weeks ago
- Simple package to extract text with coordinates from programmatic PDFs☆214Updated 2 weeks ago
- Extract structured text from pdfs quickly☆624Updated 5 months ago
- Fast State-of-the-Art Static Embeddings☆1,907Updated last week
- Running Docling as an API service☆944Updated 3 weeks ago
- Python bindings to PDFium, reasonably cross-platform.☆675Updated this week
- Late Interaction Models Training & Retrieval☆652Updated last week
- 👩🏻🍳 A collection of example notebooks using Haystack☆509Updated last week
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆828Updated 9 months ago
- A Python client for the Unstructured Platform API☆108Updated this week
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,395Updated last week
- Easily deploy Haystack pipelines as REST APIs and MCP Tools.☆124Updated this week
- 🦦 weasel: A small and easy workflow system☆88Updated last week
- A very simple news crawler with a funny name☆418Updated this week
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.…☆447Updated 8 months ago
- Benchmarking PDF libraries☆315Updated 4 months ago
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,455Updated 2 months ago
- Plug-and-play, zero-shot document processing pipelines.☆113Updated this week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,572Updated 5 months ago
- OCR Benchmark☆591Updated last month
- 📝 Automatically annotate papers using LLMs☆361Updated 7 months ago