StabRise / spark-pdfView external linksLinks
PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
☆78Apr 27, 2025Updated 9 months ago
Alternatives and similar repositories for spark-pdf
Users that are interested in spark-pdf are comparing it to the libraries listed below
Sorting:
- ScaleDP is an Open-Source extension of Apache Spark for Document Processing☆17Dec 2, 2025Updated 2 months ago
- The Lightning Catalog is an open-source data catalog designed for preparing data at any scale in ad-hoc analytics, data virtualization, …☆36Feb 5, 2026Updated last week
- ☆20Jan 31, 2026Updated last week
- Tool for visualizing Apache Oozie pipelines☆12Feb 15, 2016Updated 9 years ago
- Notebook Discovery Tool for Databricks notebooks☆19Jul 14, 2022Updated 3 years ago
- Magic to help Spark pipelines upgrade☆34Sep 29, 2024Updated last year
- Lahinch surf predictions with Hopsworks☆15May 21, 2025Updated 8 months ago
- A Spark connector for the Azure Common Data Model☆15May 31, 2023Updated 2 years ago
- Delta Lake helper methods in PySpark☆327Jan 19, 2026Updated 3 weeks ago
- Notebooks for querying Fabric APIs and storing data in Fabric Lakehouses☆25May 20, 2024Updated last year
- a chrome extension that takes an image and turns it into a csv☆44Aug 31, 2025Updated 5 months ago
- Collection of NiFi-related stuff☆24Oct 27, 2022Updated 3 years ago
- Tools for Microsoft Fabric☆24Jul 17, 2025Updated 6 months ago
- Custom PySpark Data Sources☆85Jan 31, 2026Updated 2 weeks ago
- SparkConnect Server plugin and protobuf messages for the Amazon Deequ Data Quality Engine.☆26Feb 22, 2025Updated 11 months ago
- Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!☆235Jan 24, 2025Updated last year
- ☆28Oct 14, 2024Updated last year
- Implementation of core-expansion algorithm☆11Jan 26, 2026Updated 2 weeks ago
- Russian words synonyms and antonyms☆11Dec 7, 2021Updated 4 years ago
- A Docker Compose files to compose a NiFi cluster on Docker.☆35May 29, 2017Updated 8 years ago
- 【原型探索】基于MPEG-DASH的SRD,实现只传输和渲染用户观看FOV区域中的全景视频分块。Powered By dash-srd.js☆10Oct 6, 2020Updated 5 years ago
- Data profiling tools for Big Data☆11Nov 17, 2025Updated 2 months ago
- A very simple & fast WebTTL based on ESP8266 Wifi Module, coding by Arduino☆10Apr 22, 2022Updated 3 years ago
- An SBT Plugin that acts as a light wrapper around Buf.☆10Oct 29, 2024Updated last year
- Platform for creating audio-first AI assistants that can work offline using a flexible plugin architecture☆13Jun 29, 2025Updated 7 months ago
- A fully open-source, self-hostable data lakehouse for local development and testing of modern data workflows☆28Jan 26, 2026Updated 2 weeks ago
- Game of Life in Java: Solution for Coderetreat facilitators☆11Mar 27, 2023Updated 2 years ago
- CONFSEC's ComputeNode component of the OpenPCC standard☆17Dec 15, 2025Updated last month
- SipDemo made to illustrate the VoIP over the Android device.☆11Feb 6, 2014Updated 12 years ago
- End-to-end proof of concept showing core MLOps practices to develop, deploy and monitor a machine learning model for an employee retentio…☆15May 28, 2024Updated last year
- Integration of Iceberg table management into Spark SQL☆11Jan 21, 2020Updated 6 years ago
- Analyzing the most strategic words to guess on Wordle, based on letter frequency distributions☆11Feb 20, 2022Updated 3 years ago
- ☆10Jan 23, 2023Updated 3 years ago
- Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.☆37Jan 3, 2023Updated 3 years ago
- A Spark plugin for reading and writing Excel files☆520Feb 4, 2026Updated last week
- 🧊 Tests for stpyvista deployed in streamlit community cloud☆13Updated this week
- Demo application using cordova and ionic.☆19Jan 12, 2015Updated 11 years ago
- Dev tools and cheatsheet of common Linux commands☆10Jan 10, 2026Updated last month
- ☆25Nov 16, 2025Updated 2 months ago