ScaleDP is an Open-Source extension of Apache Spark for Document Processing
☆18Dec 2, 2025Updated 6 months ago
Alternatives and similar repositories for ScaleDP
Users that are interested in ScaleDP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A flake8 plugin that detects of usage withColumn in a loop or inside reduce☆28Jun 20, 2025Updated 11 months ago
- Generate and Compare Debezium CDC (Chance Data Capture) Avro Schema, directly from your Database.☆27Jun 11, 2026Updated last week
- Hackerrank, Coursera, other studies☆13Aug 19, 2021Updated 4 years ago
- Sample scripts to use with Agentic Document Extraction (ADE).☆49Apr 30, 2026Updated last month
- Example gaming leaderboard application covering streaming ingestion, CDC enrichment, processing and visualisation including demo of advan…☆21Nov 18, 2025Updated 7 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- This repo contains information about DuckDB extensions found on GitHub. Refreshed daily☆113Updated this week
- Demonstrating the capabilities of DuckDB as a transformation engine for data lakes☆35Oct 8, 2024Updated last year
- Pipeline for ingesting documents (like pdfs and docx) into a searchable Azure Database for vector and hybrid searching.☆39Feb 20, 2026Updated 3 months ago
- ☆23Nov 17, 2022Updated 3 years ago
- Trino On K8S Via Helm & Metastore Workshop Querying Delta Tables☆12Jan 27, 2025Updated last year
- springboot demo combined with scala and java☆11Dec 7, 2017Updated 8 years ago
- Implementation of core-expansion algorithm☆11Jan 26, 2026Updated 4 months ago
- Medical records you can copy and paste☆12Mar 3, 2023Updated 3 years ago
- A Django app to capture OAuth2 tokens for non-authentication purposes, enabling your application to act on behalf of users across externa…☆13May 11, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [EXPERIMENTAL] This project is a PoC for a WebAssembly (Wasm) based OpenTelemetry Collector plugins.☆23Updated this week
- RSS feeds in public.☆15May 7, 2026Updated last month
- Various user contributed plugins☆13Jun 10, 2016Updated 10 years ago
- Some zig libraries☆14Sep 5, 2023Updated 2 years ago
- ☆10Dec 16, 2022Updated 3 years ago
- Bulk rename files with your favourite editor☆15Nov 12, 2025Updated 7 months ago
- DuckDB Cron Expression Extension☆28Jun 23, 2024Updated last year
- 一个在vue项目中使用jsPlumb的实际案例☆11Jan 4, 2023Updated 3 years ago
- ☆12Mar 26, 2020Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- This package contains the grammar in ANTLR g4 format and Java parser for the Data Quality Definition Language (DQDL), used by AWS Glue Da…☆23May 19, 2026Updated 3 weeks ago
- A high-performance, in-memory, git-backed OLAP database (of nothing).☆12Jan 23, 2025Updated last year
- Ambrogio is a dev agent who tackles tech debt. Starting with automatic unit tests and docstring.☆14Mar 30, 2025Updated last year
- Tier List Maker is a free opensource online tool that helps you create, customize, and share tier lists for anything you want to rank.☆63Dec 7, 2025Updated 6 months ago
- A tool to monitor and aggregate data for tens of thousands of deposit accounts across 9,200+ financial institutions in the U.S.☆11Feb 19, 2024Updated 2 years ago
- ☆40Aug 11, 2024Updated last year
- The resources of the preparation course for Databricks Data Engineer Professional certification exam☆204Dec 12, 2025Updated 6 months ago
- ☆19Jul 7, 2025Updated 11 months ago
- SchemAlchemy = Schematics + SQLAlchemy☆21Nov 20, 2017Updated 8 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Visits sessionization pipeline used for the talk☆13May 28, 2024Updated 2 years ago
- An example Flask app that uses s3-saver, url-for-s3, flask-thumbnails-s3, and flask-admin-s3-upload to store and retrieve files on Amazon…☆10Aug 28, 2015Updated 10 years ago
- ☆12Mar 10, 2019Updated 7 years ago
- MCP server for ROS to control robots via topics, services, and actions.☆35Aug 19, 2025Updated 10 months ago
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆44Apr 22, 2023Updated 3 years ago
- Neural models for predicting angular choice in road networks☆12Jan 19, 2024Updated 2 years ago
- A film cartridge designed to fit Fuji Single 8 cameras☆16Oct 25, 2025Updated 7 months ago