PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
☆81Apr 27, 2025Updated last year
Alternatives and similar repositories for spark-pdf
Users that are interested in spark-pdf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆21Apr 18, 2026Updated 2 weeks ago
- The Lightning Catalog is an open-source data catalog designed for preparing data at any scale in ad-hoc analytics, data virtualization, …☆36Feb 5, 2026Updated 3 months ago
- Notebook Discovery Tool for Databricks notebooks☆19Jul 14, 2022Updated 3 years ago
- A flake8 plugin that detects of usage withColumn in a loop or inside reduce☆28Jun 20, 2025Updated 10 months ago
- ☆18Jun 16, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Custom PySpark Connectors☆97Mar 3, 2026Updated 2 months ago
- Delta Lake helper methods in PySpark☆328Jan 19, 2026Updated 3 months ago
- Magic to help Spark pipelines upgrade☆34Sep 29, 2024Updated last year
- Lahinch surf predictions with Hopsworks☆15May 21, 2025Updated 11 months ago
- A Spark connector for the Azure Common Data Model☆15May 31, 2023Updated 2 years ago
- ☆17Nov 26, 2024Updated last year
- a chrome extension that takes an image and turns it into a csv☆45Aug 31, 2025Updated 8 months ago
- Generate and Compare Debezium CDC (Chance Data Capture) Avro Schema, directly from your Database.☆26Updated this week
- This repo contains information about DuckDB extensions found on GitHub. Refreshed daily☆113Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Tools for Microsoft Fabric☆25Jul 17, 2025Updated 9 months ago
- Hackerrank, Coursera, other studies☆13Aug 19, 2021Updated 4 years ago
- An IoT Edge Module that generates sample data using [Bogus](https://github.com/bchavez/Bogus)☆10Dec 8, 2022Updated 3 years ago
- Integration of Iceberg table management into Spark SQL☆11Jan 21, 2020Updated 6 years ago
- ☆11Nov 26, 2024Updated last year
- End-to-end proof of concept showing core MLOps practices to develop, deploy and monitor a machine learning model for an employee retentio…☆17May 28, 2024Updated last year
- Generate mock data based on an Apache Avro schema and specific cardinality settings☆10Apr 16, 2018Updated 8 years ago
- ☆12Aug 6, 2020Updated 5 years ago
- ☆11Oct 19, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Making Databricks easy to use for R developers.☆26Oct 6, 2022Updated 3 years ago
- Genie Framework improves Spark Pool utilization by executing multiple Synapse notebooks on the same spark pool instance☆28Dec 19, 2023Updated 2 years ago
- Preparatory notes for the Cloudera Spark and Hadoop Certification☆18Dec 5, 2018Updated 7 years ago
- GraphQL to SPARQL bridge☆24Feb 9, 2022Updated 4 years ago
- ☆10Nov 2, 2023Updated 2 years ago
- ☆11Feb 14, 2020Updated 6 years ago
- Port of MIT's xv6 OS to 32 bit RISC V☆12Feb 12, 2023Updated 3 years ago
- example of a Microsoft Fabric Solution☆33Dec 28, 2025Updated 4 months ago
- This is the public repo of the code from ReasonKGE☆16Sep 18, 2021Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Run Local Kafka and Kpow with Docker Compose☆19Apr 28, 2026Updated last week
- Cloud formation script for solr servers☆17Jul 1, 2015Updated 10 years ago
- Local AWS - a lightweight AWS service emulator☆43Apr 26, 2026Updated last week
- Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.☆37Jan 3, 2023Updated 3 years ago
- We will see how we can show the real-time data from our IoT device in an Angular application using Azure SignalR service and Azure Functi…☆13Jan 3, 2019Updated 7 years ago
- A Python CLI application that demonstrates how you can access AWS services, such as Amazon S3 and Amazon Athena, using trusted identity p…☆13Mar 11, 2025Updated last year
- A Docker Compose files to compose a NiFi cluster on Docker.☆35May 29, 2017Updated 8 years ago