explosion / spacy-layoutLinks
π Process PDFs, Word documents and more with spaCy
β844Updated 10 months ago
Alternatives and similar repositories for spacy-layout
Users that are interested in spacy-layout are comparing it to the libraries listed below
Sorting:
- π¦ Integrating LLMs into structured NLP pipelinesβ1,362Updated last year
- Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)β482Updated 6 months ago
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)β255Updated 7 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.β544Updated 3 months ago
- SpanMarker for Named Entity Recognitionβ465Updated last year
- Unified Schema-Based Information Extractionβ566Updated last week
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024β2,737Updated this week
- Simple package to extract text with coordinates from programmatic PDFsβ232Updated last week
- A spaCy wrapper for GliNERβ129Updated last year
- Fast Multimodal Semantic Deduplication & Filteringβ877Updated last week
- Late Interaction Models Training & Retrievalβ693Updated 3 weeks ago
- Extract structured text from pdfs quicklyβ653Updated 7 months ago
- Easily deploy Haystack pipelines as REST APIs and MCP Tools.β135Updated this week
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.β¦β470Updated last month
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and croβ¦β933Updated 3 weeks ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,465Updated last month
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.β1,480Updated 5 months ago
- Docling core data types and transformationsβ223Updated this week
- Benchmarking PDF librariesβ321Updated 6 months ago
- π©π»βπ³ A collection of example notebooks using Haystackβ520Updated 2 weeks ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,590Updated last month
- β180Updated last week
- PyMuPDF4LLMβ1,243Updated 3 weeks ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β841Updated last year
- A very simple news crawler with a funny nameβ432Updated this week
- Fast State-of-the-Art Static Embeddingsβ1,990Updated last month
- β‘οΈA Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion πβ640Updated 5 months ago
- Visualize Different Text Splitting Methodsβ318Updated last year
- Fast, Accurate, Lightweight Python library to make State of the Art Embeddingβ2,665Updated 3 weeks ago
- A Python client for the Unstructured Platform APIβ112Updated this week