linkedin/spark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/linkedin/spark)

linkedin / spark

Apache Spark - A unified analytics engine for large-scale data processing

☆16

Alternatives and similar repositories for spark

Users that are interested in spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

oap-project / pmem-shuffle
View on GitHub
Spark* Shuffle plugin for support shuffling through remote persistent memory over fabrics, which leverages the RDMA network and remote pe…
☆14Sep 18, 2023Updated 2 years ago
HKU-BAL / MegaGTA
View on GitHub
HMM-guided metagenomic gene-targeted assembler using iterative de Bruijn graphs
☆18Oct 3, 2016Updated 9 years ago
voltrondata / spark-substrait-gateway
View on GitHub
Implements a gateway that speaks the SparkConnect protocol and drives a backend using Substrait (over ADBC Flight SQL).
☆19Feb 10, 2025Updated last year
databricks / congruity
View on GitHub
The goal of this library is to provide a compatibility layer that makes it easier to adopt Spark Connect. The library is designed to be s…
☆18Nov 25, 2024Updated last year
uber / RemoteShuffleService
View on GitHub
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
☆335Sep 29, 2023Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
althonos / gb-io.py
View on GitHub
A Python interface to gb-io, a fast GenBank parser written in Rust.
☆24May 21, 2026Updated 2 months ago
ayaanhossain / ViennaRNA
View on GitHub
ViennaRNA Package consists of a C code library for the prediction and comparison of RNA secondary structures
☆15May 20, 2022Updated 4 years ago
aws / go-kafka-event-source
View on GitHub
Go/Kafka client library for developing event sourcing applications
☆12Jul 1, 2026Updated 2 weeks ago
oap-project / remote-shuffle
View on GitHub
Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-dis…
☆21Mar 15, 2024Updated 2 years ago
oracle / nosql-node-sdk
View on GitHub
Node.js SDK for Oracle NoSQL Database
☆13Jul 15, 2026Updated last week
cloudera / cdp-sdk-java
View on GitHub
Cloudera CDP SDK for Java
☆17Jul 10, 2026Updated last week
whiterabb17 / SpyCore
View on GitHub
SpyCore - Windows Malicious FIle Scanner (Distributes)
☆14Jun 10, 2023Updated 3 years ago
monero-integrations / monerogo
View on GitHub
Go library for Monero RPC
☆12Dec 17, 2017Updated 8 years ago
scholzj / kafka-kubernetes-authenticator
View on GitHub
Kafka Kubernetes Authenticator and Authorizer
☆12Sep 5, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
datacenterdude / ai-agent-team-templates
View on GitHub
Role-based AI agent team templates for enterprise IT and vendor professionals. Spin up a fully structured multi-agent team for your job f…
☆18May 21, 2026Updated 2 months ago
kimrutherford / EMBOSS
View on GitHub
MIRROR OF: The European Molecular Biology Open Software Suite (from git://anonscm.debian.org/debian-med/emboss.git)
☆32Feb 18, 2022Updated 4 years ago
ray-project / contrib-workflow-dag
View on GitHub
☆11May 4, 2022Updated 4 years ago
gsmake / gsmake-go
View on GitHub
gsdocker gradle like build tool
☆19Sep 3, 2015Updated 10 years ago
googleapis / nodejs-dataproc
View on GitHub
This repository is deprecated. All of its content and history has been moved to googleapis/google-cloud-node.
☆14Jul 13, 2023Updated 3 years ago
implementing-microservices / gevent-store
View on GitHub
Event Store implementation in Go
☆14May 27, 2019Updated 7 years ago
openshift / pagerduty-operator
View on GitHub
A PagerDuty Operator that lives on Hive
☆21Updated this week
AdamSLevy / jsonrpc2
View on GitHub
Golang package for implementing a JSON RPC 2.0 server or client.
☆12Jul 6, 2023Updated 3 years ago
CryptoBridge / bridgecoin
View on GitHub
☆21Nov 24, 2017Updated 8 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
linkedin / Avro2TF
View on GitHub
Avro2TF is designed to fill the gap of making users' training data ready to be consumed by deep learning training frameworks.
☆129May 9, 2020Updated 6 years ago
salesforce / orchard
View on GitHub
☆18Jul 13, 2026Updated last week
shiyanlou / datastructure_code
View on GitHub
☆11Nov 21, 2014Updated 11 years ago
huangchong94 / CS144labs
View on GitHub
stanford introduction to computer networking labs
☆15Sep 5, 2018Updated 7 years ago
nv-morpheus / MRC
View on GitHub
Morpheus Runtime Core (MRC)
☆53Jan 22, 2026Updated 6 months ago
linka-cloud / k8s-dns-manager
View on GitHub
Host DNS server and manage records inside Kubernetes Clusters
☆20May 23, 2026Updated last month
ryantd / veloce
View on GitHub
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
☆17Aug 4, 2022Updated 3 years ago
Mellanox / SparkRDMA
View on GitHub
This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvid…
☆257May 13, 2019Updated 7 years ago
pytorch / ci-infra
View on GitHub
☆16Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
googleapis / nodejs-bigquery-data-transfer
View on GitHub
This repository is deprecated. All of its content and history has been moved to googleapis/google-cloud-node.
☆12Jul 20, 2023Updated 3 years ago
oap-project / sql-ds-cache
View on GitHub
Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
☆37Jan 3, 2023Updated 3 years ago
tzm529 / weed-fs
View on GitHub
Weed-FS is a simple and highly scalable distributed file system.
☆20Sep 29, 2013Updated 12 years ago
annetteplatform / annette
View on GitHub
Platform to build distributed, scalable, enterprise-wide business applications
☆19Jun 21, 2024Updated 2 years ago
bakdata / kpops
View on GitHub
Deploy Kafka pipelines to Kubernetes
☆15Jul 2, 2026Updated 2 weeks ago
epiphanous / flinkrunner
View on GitHub
A library to support building a coherent set of flink jobs
☆17Oct 5, 2024Updated last year
oneconcern / datamon
View on GitHub
Datamon manages infinite reflections of data
☆14Apr 7, 2023Updated 3 years ago