Mirror of Apache DataFu
☆121May 20, 2025Updated 9 months ago
Alternatives and similar repositories for datafu
Users that are interested in datafu are comparing it to the libraries listed below
Sorting:
- A library that brings useful functions from various modern database management systems to Apache Spark☆61Sep 4, 2023Updated 2 years ago
- Hadoop InputFormat for http://druid.io/☆10Oct 26, 2016Updated 9 years ago
- Sample code for working with HBase Thrift.☆15Jul 25, 2013Updated 12 years ago
- IPython magics to work with DBT☆15Jul 22, 2022Updated 3 years ago
- This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…☆16Oct 3, 2025Updated 4 months ago
- Tools for building, packaging, and OAP public cloud integrations such as AWS EMR, Google Dataproc and K8S.☆18Mar 27, 2024Updated last year
- Source code for the website geminibyexample.com which provides simple Python code examples for the Gemini SDK☆22Apr 8, 2025Updated 10 months ago
- Apache XML Graphics Commons☆20Dec 12, 2025Updated 2 months ago
- Column-wise type annotations for pyspark DataFrames☆95Updated this week
- Benchmarks for Bref running on AWS Lambda☆20Sep 4, 2025Updated 5 months ago
- Mirror of Apache Crunch (Incubating)☆109Feb 2, 2021Updated 5 years ago
- A common security infrastructure used by Spring Cloud Data Flow and the projects in its ecosystem☆19Apr 1, 2025Updated 11 months ago
- DEPRECATED—Open source Apache Cassandra running on DC/OS is now replaced by mesosphere/dcos-commons/frameworks/cassandra. This repositor…☆116May 1, 2019Updated 6 years ago
- Examples of Spark 3.0☆45Nov 11, 2020Updated 5 years ago
- Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka☆25Oct 16, 2020Updated 5 years ago
- Mirror of Apache VXQuery☆20Jan 11, 2019Updated 7 years ago
- ☆20Sep 23, 2018Updated 7 years ago
- A functional wrapper around Spark to make it works with ZIO☆52Updated this week
- Cloudera Director API clients☆17May 20, 2022Updated 3 years ago
- SparkFHE project demo examples☆23Dec 18, 2025Updated 2 months ago
- Mirror of Apache Knox☆212Updated this week
- Discover Flink clusters on Hadoop YARN for Prometheus☆23Aug 5, 2020Updated 5 years ago
- Mirror of Apache Hama☆132Feb 11, 2020Updated 6 years ago
- ACID Data Source for Apache Spark based on Hive ACID☆97Jul 7, 2021Updated 4 years ago
- Essential Spark extensions and helper methods ✨😲☆766Sep 14, 2025Updated 5 months ago
- Better bridge apache spark and postgresql☆23Sep 11, 2023Updated 2 years ago
- ⛩ Developer mediated access to the Oasis Platform☆23Oct 30, 2020Updated 5 years ago
- A Python PySpark Projet with Poetry☆27Feb 17, 2026Updated last week
- Simplify getting Zeppelin up and running☆56Jul 20, 2016Updated 9 years ago
- Druid indexing plugin for using Spark in batch jobs☆101Oct 21, 2021Updated 4 years ago
- GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs☆1,135Feb 6, 2026Updated 3 weeks ago
- GeoTrellis PointCloud library to work with any pointcloud data on Spark☆27Oct 5, 2020Updated 5 years ago
- Piglet is a DSL for writing Pig scripts in Ruby☆83Jul 21, 2010Updated 15 years ago
- Stratosphere is now Apache Flink.☆198Dec 16, 2023Updated 2 years ago
- Mirror of Apache Tajo☆135May 11, 2020Updated 5 years ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Apr 23, 2019Updated 6 years ago
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.☆347May 31, 2024Updated last year
- Code to collect and analyze traceroute data within a network topology☆28Nov 20, 2018Updated 7 years ago
- Point-in-Time optimizations for Apache Spark☆30Jan 18, 2024Updated 2 years ago