Document Filters is an SDK for applications like content indexing, e-discovery, data migration, and feeding data into AI/ML models by extracting data from unstructured sources. It gives the ability to perform deep inspection, data extraction, output manipulation, and conversion for virtually any type of document, in any programming language.
☆26Feb 18, 2026Updated 2 months ago
Alternatives and similar repositories for DocumentFilters
Users that are interested in DocumentFilters are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Experimental command suggestion system based on historical usage of commands in certain locations.☆12Feb 18, 2026Updated 2 months ago
- Interactive git commands using fzf, available as zsh plugin☆18Mar 17, 2024Updated 2 years ago
- QtSemanticNotes is a personal knowledge base, personal wiki or just note taking application that features automatic linking, tree view an…☆19Dec 18, 2017Updated 8 years ago
- Highly concurrent and fast content processing for Mighty Inference Server☆10Feb 6, 2023Updated 3 years ago
- mReasoner is a unified computational implementation of the model theory of thinking and reasoning☆14Aug 17, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Wikimedia Enterprise - client SDK in Python☆21Mar 24, 2026Updated last month
- Blazing fast signature detection☆11Sep 5, 2022Updated 3 years ago
- Formula to detect the ease of reading a text according to the Coleman-Liau index (1975)☆14Nov 1, 2022Updated 3 years ago
- Containerfile for the Vanilla OS Desktop+Nvidia image.☆17Apr 2, 2026Updated last month
- Official library of images for the SIGIR 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)☆13Jul 7, 2019Updated 6 years ago
- Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool☆14Dec 12, 2025Updated 4 months ago
- Via Text Density Simple Web Crawler With Go☆13Mar 19, 2023Updated 3 years ago
- Particle Syntax Website☆16Apr 12, 2026Updated 3 weeks ago
- C4RepSet: Representative Subset from C4 data for Training Pre-trained LMs☆11Jan 13, 2023Updated 3 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Dataset from Tip of the Tongue Known-Item Retrieval (2021) paper.☆12Nov 4, 2021Updated 4 years ago
- prevent XSS attacks by sanitizing html (this is different then escaping!)☆22Oct 14, 2023Updated 2 years ago
- Code that drives the public web-based tools for the Media Cloud Online News Archive and Directory.☆11Updated this week
- A UI designer for constructing AI applications with OpenSearch☆16Updated this week
- Encryption which converts English characters to unicode characters that mimicking their appearance☆12Sep 17, 2017Updated 8 years ago
- How to backdoor Diffie-Hellman, lessons learned from the Socat non-prime prime☆11Jun 29, 2021Updated 4 years ago
- Temporal and Causal Reasoning (dataset)☆10Apr 19, 2022Updated 4 years ago
- Xayn AI☆18May 9, 2022Updated 3 years ago
- LLM Oracle is a GPT-4 powered tool for predicting future events. It's like a Magic 8 Ball that is able to perform basic research, calcula…☆17May 27, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Security research organization dedicated to finding low hanging, critical, vulnerabilities.☆15May 12, 2022Updated 3 years ago
- A service to auto-hide Hacker News articles by keyword, site, and more☆12Oct 12, 2025Updated 6 months ago
- R library for common information retrieval metrics☆14Jun 5, 2023Updated 2 years ago
- This repository is meant to optimize hybrid search settings for OpenSearch. It covers a grid search approach to identify a good parameter…☆13Sep 1, 2025Updated 8 months ago
- Pure Elixir implementation of Sha3 and the original Keccak1600-f☆16Jan 20, 2026Updated 3 months ago
- Obsidian plugin that automatically switches between preview and source mode.☆31Dec 18, 2021Updated 4 years ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆18Jun 9, 2022Updated 3 years ago
- TREC Core track☆11Jul 5, 2017Updated 8 years ago
- TUI for managing beads☆39Jan 8, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- DALLE-tools provided useful dataset utilities to improve you workflow with WebDatasets.☆14Mar 9, 2022Updated 4 years ago
- This is a solution accelerator for creating personalized content recommendations based on user activity.☆13Mar 26, 2024Updated 2 years ago
- ☆14Feb 2, 2023Updated 3 years ago
- Hackable personal news reader in Bash☆27Jan 20, 2026Updated 3 months ago
- ☆14May 6, 2018Updated 8 years ago
- Binary Ninja plugin for annotation of arguments for functions☆22Oct 20, 2024Updated last year
- A zsh plugin for informative terminal window titles. Over 1900 unique cloners as of Aug '25☆36May 24, 2024Updated last year