Plug-and-play document AI with zero-shot models.
☆124Feb 16, 2026Updated last week
Alternatives and similar repositories for sieves
Users that are interested in sieves are comparing it to the libraries listed below
Sorting:
- FlexiTokens☆18Dec 27, 2025Updated 2 months ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆26Mar 6, 2025Updated 11 months ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆28Nov 18, 2025Updated 3 months ago
- spaCy entry points for Curated Transformers☆32May 28, 2025Updated 9 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆78Feb 10, 2026Updated 2 weeks ago
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆21Aug 15, 2024Updated last year
- Parent repository for the MOJ Analytics Platform☆14Nov 16, 2021Updated 4 years ago
- A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …☆63Feb 6, 2025Updated last year
- ☆23Jan 2, 2023Updated 3 years ago
- The privacy-preserving record linkage toolkit: a proof-of-concept public demo of next-gen data linkage techniques.☆15May 22, 2024Updated last year
- A curated list of materials on AI guardrails☆45Jun 3, 2025Updated 8 months ago
- Customize, control, and enhance LLM generation with logits processors, featuring visualization capabilities to inspect and understand sta…☆44Jan 8, 2026Updated last month
- ☆68Mar 17, 2022Updated 3 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆157May 24, 2024Updated last year
- Execute arbitrary SQL queries on 🤗 Datasets☆32Jan 24, 2024Updated 2 years ago
- 🔢 Work with static vector models☆37Apr 21, 2025Updated 10 months ago
- Instant redline with AI summary☆37Dec 7, 2025Updated 2 months ago
- C inference engine for running GLiClass (Generalist and Lightweight Classification) models☆16May 21, 2025Updated 9 months ago
- Evaluation framework for document processing models and services.☆63Feb 12, 2026Updated 2 weeks ago
- KL3M training data collection and preprocessing☆20Apr 14, 2025Updated 10 months ago
- Generate Python data structures and XML parser from Xschema (Python 3 port)☆12Jan 13, 2015Updated 11 years ago
- ☄️ Parallel and distributed training with spaCy and Ray☆56Jul 31, 2023Updated 2 years ago
- spaCy extension for Visual Studio Code☆32Mar 10, 2025Updated 11 months ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆36Mar 27, 2024Updated last year
- 🧪 Cutting-edge experimental spaCy components and features☆105Apr 23, 2024Updated last year
- Legal Matter Standard Specification (LMSS) library for Python☆17Nov 14, 2023Updated 2 years ago
- simple grpo☆12May 28, 2025Updated 9 months ago
- ☆10Oct 22, 2024Updated last year
- Content for the NumPy newsletter, which anyone can sign up for in the numpy.org footer☆14Jul 20, 2023Updated 2 years ago
- A library for data streaming and augmentation☆21May 5, 2025Updated 9 months ago
- Just some nice dice in Python☆21Jan 6, 2026Updated last month
- Generate reports for spaCy models.☆29May 27, 2022Updated 3 years ago
- SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data For High-Stakes Domains (EMNLP 2025 System Demonstration)☆26Nov 3, 2025Updated 3 months ago
- ☆15May 8, 2019Updated 6 years ago
- A CLI for generating synthetic data☆43May 14, 2025Updated 9 months ago
- Fast Multimodal Semantic Deduplication & Filtering☆890Jan 20, 2026Updated last month
- simple script to generate a projection of beds required to support given trajectory of covid19 cases requiring hospitalisation☆17Mar 22, 2020Updated 5 years ago
- Repositorio general para Bootcamps de Data Science en Coding Dojo☆11Nov 13, 2025Updated 3 months ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago