apache/datafu

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/apache/datafu)

apache / datafu

Mirror of Apache DataFu

☆124

Alternatives and similar repositories for datafu

Users that are interested in datafu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LinkedInAttic / datafu
View on GitHub
Hadoop library for large-scale data processing, now an Apache Incubator project
☆581Jul 8, 2014Updated 12 years ago
databricks / simr
View on GitHub
Spark In MapReduce (SIMR) - launching Spark applications on existing Hadoop MapReduce infrastructure
☆44Mar 9, 2022Updated 4 years ago
eljefe6a / HBaseThrift
View on GitHub
Sample code for working with HBase Thrift.
☆15Jul 25, 2013Updated 12 years ago
cerndb / SparkPlugins
View on GitHub
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…
☆96May 11, 2026Updated 2 months ago
implydata / druid-hadoop-inputformat
View on GitHub
Hadoop InputFormat for http://druid.io/
☆10Oct 26, 2016Updated 9 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
jmockit / jmockit.github.io
View on GitHub
Home Page and documentation for the JMockit open source project
☆12Feb 29, 2020Updated 6 years ago
apache / pig
View on GitHub
Mirror of Apache Pig
☆687May 15, 2026Updated 2 months ago
oap-project / oap-tools
View on GitHub
Tools for building, packaging, and OAP public cloud integrations such as AWS EMR, Google Dataproc and K8S.
☆18Mar 27, 2024Updated 2 years ago
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated last month
example42 / puppetguide-book
View on GitHub
The [DevOps] Guide to Puppet, Universe, and Everything - BOOK
☆15Oct 6, 2015Updated 10 years ago
godatadriven / ducklake-blog-1
View on GitHub
Example files used in the DuckDB - Unity Catalog blog
☆10Dec 6, 2024Updated last year
nsphung / pyspark-template
View on GitHub
A Python PySpark Projet with Poetry
☆31May 2, 2026Updated 2 months ago
cerndb / sparkMeasure
View on GitHub
This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…
☆16May 21, 2026Updated 2 months ago
chenharryhua / nanjin
View on GitHub
explore kafka, fs2 and pure functional programming in scala
☆34Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
example42 / puppet-architectures
View on GitHub
Sample Puppet Architectures Playground
☆16Aug 12, 2016Updated 9 years ago
med-at-scale / high-health
View on GitHub
Integrate the GA4GH schemas and probably a scala impl of the service.
☆14May 20, 2016Updated 10 years ago
example42 / puppetguide-slides
View on GitHub
The [DevOps] Guide to Puppet, Universe, and Everything - SLIDES
☆15Nov 29, 2019Updated 6 years ago
josephxsxn / moya
View on GitHub
Memcached on YARN
☆19Jun 2, 2014Updated 12 years ago
phatak-dev / spark-3.0-examples
View on GitHub
Examples of Spark 3.0
☆44Nov 11, 2020Updated 5 years ago
swipely / pipely
View on GitHub
Visualize pipeline definitions for AWS Data Pipeline
☆23Feb 3, 2026Updated 5 months ago
devmindset / sparkscalainterview
View on GitHub
Contain Interview Questions Solutions
☆12May 18, 2018Updated 8 years ago
mesosphere-backup / dcos-cassandra-service
View on GitHub
DEPRECATED—Open source Apache Cassandra running on DC/OS is now replaced by mesosphere/dcos-commons/frameworks/cassandra. This repositor…
☆117May 1, 2019Updated 7 years ago
oasislabs / oasis-gateway
View on GitHub
⛩ Developer mediated access to the Oasis Platform
☆23Oct 30, 2020Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
apache / knox
View on GitHub
Mirror of Apache Knox
☆218Updated this week
apache / hama
View on GitHub
Mirror of Apache Hama
☆133Feb 11, 2020Updated 6 years ago
qubole / spark-acid
View on GitHub
ACID Data Source for Apache Spark based on Hive ACID
☆97Jul 7, 2021Updated 5 years ago
aimichelle / cs186
View on GitHub
Section slides for CS186.
☆10Jan 13, 2016Updated 10 years ago
kite-sdk / kite
View on GitHub
Kite SDK
☆393Nov 1, 2022Updated 3 years ago
Bunyod / PracticalFPinScala
View on GitHub
Practical FP in Scala book by Gabriel Volpe. Implementation with my view
☆17Aug 17, 2024Updated last year
TopSpoofer / hbrdd
View on GitHub
一个为spark批量导入数据到hbase的库
☆43Nov 18, 2016Updated 9 years ago
santoshjoshi / Apache-Kafka
View on GitHub
Apache Kafka Overview
☆12Jun 9, 2023Updated 3 years ago
ifding / flex-bison
View on GitHub
flex & bison (Lexical Analysis and Parsing)
☆12May 18, 2018Updated 8 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Roasbeef / perm-crypt
View on GitHub
A Golang implementation of the AES-FFX Format-Preserving Encryption Scheme
☆10Mar 14, 2015Updated 11 years ago
apache / crunch
View on GitHub
Mirror of Apache Crunch (Incubating)
☆110Feb 2, 2021Updated 5 years ago
SaurabhChawla100 / spark-radiant
View on GitHub
Spark-Radiant is Apache Spark Performance and Cost Optimizer
☆25Dec 31, 2024Updated last year
graphframes / graphframes
View on GitHub
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
☆1,194Jun 23, 2026Updated 3 weeks ago
BenLangmead / myrna
View on GitHub
Cloud-scale differential expression for RNA-seq
☆15May 23, 2023Updated 3 years ago
cloudera / director-sdk
View on GitHub
Cloudera Director API clients
☆17May 20, 2022Updated 4 years ago
oracle / spark-oracle
View on GitHub
On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.
☆36Apr 15, 2025Updated last year