Plug-and-play document AI with zero-shot models.
☆125Feb 16, 2026Updated last month
Alternatives and similar repositories for sieves
Users that are interested in sieves are comparing it to the libraries listed below
Sorting:
- Lightweight piece tokenization library☆12Apr 15, 2024Updated last year
- Modular Rust transformer/LLM library using Candle☆38May 5, 2024Updated last year
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆26Mar 6, 2025Updated last year
- FlexiTokens☆18Dec 27, 2025Updated 2 months ago
- spaCy entry points for Curated Transformers☆32May 28, 2025Updated 9 months ago
- Efficient BM25 with DuckDB 🦆☆65Dec 20, 2024Updated last year
- Pre-train Static Word Embeddings☆95Sep 9, 2025Updated 6 months ago
- A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …☆64Feb 6, 2025Updated last year
- Next-generation Punkt sentence boundary detection with zero dependencies☆29Nov 18, 2025Updated 4 months ago
- Wrapper for the macOS signpost API☆16Apr 24, 2023Updated 2 years ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆81Feb 10, 2026Updated last month
- CMU Linguistic Annotation Backend☆15Sep 22, 2025Updated 5 months ago
- 🔢 Work with static vector models☆38Apr 21, 2025Updated 10 months ago
- Read and modify constituency trees in Rust.☆10May 5, 2020Updated 5 years ago
- Customize, control, and enhance LLM generation with logits processors, featuring visualization capabilities to inspect and understand sta…☆46Jan 8, 2026Updated 2 months ago
- Execute arbitrary SQL queries on 🤗 Datasets☆32Jan 24, 2024Updated 2 years ago
- ☄️ Parallel and distributed training with spaCy and Ray☆56Jul 31, 2023Updated 2 years ago
- Kernel sources for https://huggingface.co/kernels-community☆80Updated this week
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆36Mar 27, 2024Updated last year
- A curated list of materials on AI guardrails☆48Jun 3, 2025Updated 9 months ago
- Content for the NumPy newsletter, which anyone can sign up for in the numpy.org footer☆14Jul 20, 2023Updated 2 years ago
- Evaluation framework for document processing models and services.☆65Mar 11, 2026Updated last week
- Synthetic Text Dataset Generation for LLM projects☆56Mar 10, 2026Updated last week
- GLiNER inference in JavaScript☆23Mar 2, 2025Updated last year
- ☆69Mar 17, 2022Updated 4 years ago
- C inference engine for running GLiClass (Generalist and Lightweight Classification) models☆16May 21, 2025Updated 9 months ago
- KL3M training data collection and preprocessing☆20Apr 14, 2025Updated 11 months ago
- ☆23Jan 2, 2023Updated 3 years ago
- A full fledged mistral+wandb☆13Aug 16, 2024Updated last year
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆157May 24, 2024Updated last year
- 🧪 Cutting-edge experimental spaCy components and features☆105Apr 23, 2024Updated last year
- The privacy-preserving record linkage toolkit: a proof-of-concept public demo of next-gen data linkage techniques.☆16May 22, 2024Updated last year
- Retired repository for Machine Learning utils at the Wellcome Trust (now deprecated).☆31Aug 9, 2023Updated 2 years ago
- 🦦 weasel: A small and easy workflow system☆91Nov 13, 2025Updated 4 months ago
- Nearly Inference Free Embeddings: make your RAG queries 500x faster☆70Feb 20, 2026Updated last month
- Getting interpretable dimensions in word embedding spaces.☆15Jul 6, 2023Updated 2 years ago
- ☆30Jun 23, 2022Updated 3 years ago
- ☆10Oct 22, 2024Updated last year
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆23Mar 12, 2024Updated 2 years ago