Point-in-Time optimizations for Apache Spark
☆30Jan 18, 2024Updated 2 years ago
Alternatives and similar repositories for spark-pit
Users that are interested in spark-pit are comparing it to the libraries listed below
Sorting:
- Python - Java/Scala API for the Hopsworks feature store☆55Sep 24, 2025Updated 5 months ago
- Ultra-high-performance local IPC framework with Zipkin tracing to conduct a beautiful symphony of (brotherhood) build tooling.☆10Jan 8, 2021Updated 5 years ago
- something to help you spark☆64Oct 23, 2018Updated 7 years ago
- Distributed solver library for large-scale structured output prediction, based on Spark. Project website:☆17Mar 3, 2016Updated 10 years ago
- ☆30Dec 4, 2024Updated last year
- A python library bakeoff for medium sized datasets☆24Aug 25, 2023Updated 2 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆62Sep 6, 2024Updated last year
- Use pyarrow with Azure Data Lake gen2☆28Jun 27, 2024Updated last year
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Core Gwen interpreter☆36Jan 7, 2026Updated 2 months ago
- a curated list of awesome lakehouse frameworks, applications, etc☆42Feb 9, 2026Updated 3 weeks ago
- Python library & CLI to create, view and edit PFB files☆12Feb 19, 2026Updated 2 weeks ago
- EncryCore node reference implementation☆15Apr 2, 2020Updated 5 years ago
- Python Package to Share/Edit Pandas/Polars DF with web interface!☆11Jun 10, 2025Updated 8 months ago
- A package that enables the use of SIMD x86 instructions in the Lightweight Modular Staging Framework (LMS).☆40Apr 19, 2018Updated 7 years ago
- Data-Driven Spark allows quick data exploration based on Apache Spark.☆29Jan 6, 2017Updated 9 years ago
- Benchmarks of artificial neural network library for Spark MLlib☆11Dec 3, 2015Updated 10 years ago
- This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, …☆17Feb 5, 2026Updated last month
- A command-line interface for interacting with the NeoLoad Web Platform...running tests, reporting results, etc...on your workstation or i…☆10Jan 20, 2026Updated last month
- How to customize Tableau authentication using the AWS Athena's JDBC Credentials Provider capabilites.☆14Jun 8, 2020Updated 5 years ago
- FTRL-Proximal Online Learning Algorithm☆15May 22, 2017Updated 8 years ago
- Factorization Machines for Julia☆11Aug 26, 2016Updated 9 years ago
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆37Mar 9, 2021Updated 4 years ago
- Proximal Asynchronous SAGA☆13Nov 30, 2017Updated 8 years ago
- Example for baking the current git commit hash into a bazel C++ project☆11Jan 25, 2022Updated 4 years ago
- ☆11Nov 26, 2024Updated last year
- Associated blog post - https://tristanrhodes.com/blog/Adventures-in-Algorithmic-Trading-on-the-Runescape-Grand-Exchange☆10Oct 14, 2024Updated last year
- ☆11Dec 23, 2017Updated 8 years ago
- A Framework for building Distributed Consensus Protocols☆10Oct 13, 2017Updated 8 years ago
- A repository for all code generated at our Datadive events☆36May 12, 2012Updated 13 years ago
- The code for the in memory data pipeline that was presented at Berlin Buzzwords 2015.☆10Jun 1, 2015Updated 10 years ago
- Code for the "Sample-efficient Integration of New Modalities into Large Language Models" paper☆16Sep 8, 2025Updated 5 months ago
- Gain information about applications to inform deployments☆11Mar 3, 2022Updated 4 years ago
- Hierarchical Image Representation☆10Dec 9, 2023Updated 2 years ago
- This is a POC to test pgTAP (I use Docker image) to write and execute PL/pgSQL - SQL Procedural Language.☆11Mar 1, 2018Updated 8 years ago
- A Configuration System for Airflow☆16Updated this week
- Mirror of Apache Spark☆11Jan 1, 2026Updated 2 months ago
- Course Materials for the MMCi Practical Data Science Course☆19Apr 10, 2020Updated 5 years ago
- Helper for handling PySpark DataFrame partition size 📑🎛️☆12Mar 8, 2024Updated last year