explosion / spacy-layoutLinks
π Process PDFs, Word documents and more with spaCy
β784Updated 7 months ago
Alternatives and similar repositories for spacy-layout
Users that are interested in spacy-layout are comparing it to the libraries listed below
Sorting:
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)β245Updated 4 months ago
- Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)β468Updated 3 months ago
- π¦ Integrating LLMs into structured NLP pipelinesβ1,328Updated 9 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.β400Updated this week
- A spaCy wrapper for GliNERβ123Updated 9 months ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024β2,459Updated last week
- A python library to define and validate data types in Docling.β198Updated this week
- Simple package to extract text with coordinates from programmatic PDFsβ209Updated last week
- SpanMarker for Named Entity Recognitionβ460Updated 9 months ago
- Fast Semantic Text Deduplication & Filteringβ823Updated 3 weeks ago
- Extract structured text from pdfs quicklyβ614Updated 4 months ago
- A very simple news crawler with a funny nameβ415Updated this week
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,368Updated last month
- β158Updated last week
- Benchmarking PDF librariesβ314Updated 4 months ago
- π©π»βπ³ A collection of example notebooks using Haystackβ506Updated 3 weeks ago
- PyMuPDF4LLMβ1,089Updated last month
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.β¦β436Updated 7 months ago
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and croβ¦β877Updated last month
- 𦦠weasel: A small and easy workflow systemβ87Updated last year
- Fast State-of-the-Art Static Embeddingsβ1,882Updated 3 weeks ago
- Running Docling as an API serviceβ871Updated last week
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.β1,448Updated 2 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β827Updated 9 months ago
- Efficiently find the best-suited language model (LM) for your NLP taskβ127Updated 3 months ago
- π Automatically annotate papers using LLMsβ358Updated 6 months ago
- Easily deploy Haystack pipelines as REST APIs and MCP Tools.β120Updated this week
- The robust European language model benchmark.β133Updated this week
- Zero and Few shot named entity & relationships recognitionβ391Updated last month
- Late Interaction Models Training & Retrievalβ632Updated this week