PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
☆80Apr 27, 2025Updated 11 months ago
Alternatives and similar repositories for spark-pdf
Users that are interested in spark-pdf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ScaleDP is an Open-Source extension of Apache Spark for Document Processing☆18Dec 2, 2025Updated 3 months ago
- Notebook Discovery Tool for Databricks notebooks☆19Jul 14, 2022Updated 3 years ago
- Repository for Databricks And Azure Maps Online Workshop Series☆17Mar 21, 2022Updated 4 years ago
- ☆18Jun 16, 2024Updated last year
- Tool for visualizing Apache Oozie pipelines☆12Feb 15, 2016Updated 10 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Custom PySpark Connectors☆92Mar 3, 2026Updated 3 weeks ago
- Delta Lake helper methods in PySpark☆328Jan 19, 2026Updated 2 months ago
- good lecture☆16Mar 31, 2025Updated 11 months ago
- Lahinch surf predictions with Hopsworks☆15May 21, 2025Updated 10 months ago
- A Spark connector for the Azure Common Data Model☆15May 31, 2023Updated 2 years ago
- ☆17Nov 26, 2024Updated last year
- a chrome extension that takes an image and turns it into a csv☆45Aug 31, 2025Updated 6 months ago
- Generate and Compare Debezium CDC (Chance Data Capture) Avro Schema, directly from your Database.☆24Updated this week
- This repo contains information about DuckDB extensions found on GitHub. Refreshed daily☆112Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A small example setting Python's logging configuration using a module invoked from a notebook.☆10May 14, 2023Updated 2 years ago
- Notebooks for querying Fabric APIs and storing data in Fabric Lakehouses☆25May 20, 2024Updated last year
- Minutely clientside OpenStreetMap changeset streams☆19Apr 15, 2023Updated 2 years ago
- Integration of Iceberg table management into Spark SQL☆11Jan 21, 2020Updated 6 years ago
- Collection of NiFi-related stuff☆24Oct 27, 2022Updated 3 years ago
- End-to-end proof of concept showing core MLOps practices to develop, deploy and monitor a machine learning model for an employee retentio…☆15May 28, 2024Updated last year
- Python Package for ducklake☆20Jun 5, 2025Updated 9 months ago
- ☆12Aug 6, 2020Updated 5 years ago
- ☆11Oct 19, 2023Updated 2 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Making Databricks easy to use for R developers.☆26Oct 6, 2022Updated 3 years ago
- Traditionally, engineers were needed to implement business logic via data pipelines before business users can start using it. Using this …☆12Mar 18, 2026Updated last week
- A Python CLI application that demonstrates how you can access AWS services, such as Amazon S3 and Amazon Athena, using trusted identity p…☆12Mar 11, 2025Updated last year
- GraphQL to SPARQL bridge☆24Feb 9, 2022Updated 4 years ago
- Pytorch implementation of a BiLSTM model for the Wikification project.☆19Mar 30, 2020Updated 5 years ago
- ☆11Feb 14, 2020Updated 6 years ago
- Port of MIT's xv6 OS to 32 bit RISC V☆12Feb 12, 2023Updated 3 years ago
- λFS: an elastic, high-performance, serverless-function-based metadata service for large-scale distributed file systems (ACM ASPLOS'23)☆14Apr 2, 2025Updated 11 months ago
- example of a Microsoft Fabric Solution☆32Dec 28, 2025Updated 2 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Run Local Kafka and Kpow with Docker Compose☆19May 29, 2025Updated 9 months ago
- Local AWS - a lightweight AWS service emulator☆35Updated this week
- Cloud formation script for solr servers☆17Jul 1, 2015Updated 10 years ago
- Vfsvisaonline automated appointment checker tool [Python script]☆24Dec 10, 2020Updated 5 years ago
- Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.☆37Jan 3, 2023Updated 3 years ago
- ☆28Oct 14, 2024Updated last year
- We will see how we can show the real-time data from our IoT device in an Angular application using Azure SignalR service and Azure Functi…☆13Jan 3, 2019Updated 7 years ago