xdssio/big_data_benchmarks

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xdssio/big_data_benchmarks)

xdssio / big_data_benchmarks

big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.

☆65

Alternatives and similar repositories for big_data_benchmarks

Users that are interested in big_data_benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Spratiher9 / SparkDataset
View on GitHub
Instant search for and access to many datasets in Pyspark.
☆34Oct 6, 2022Updated 3 years ago
jjthomas / rule_engine
View on GitHub
Anomaly classification with rules
☆15Jul 21, 2022Updated 4 years ago
gligorijevic / DeepAttentionModel
View on GitHub
☆12Mar 26, 2018Updated 8 years ago
svenevs / exhale-companion
View on GitHub
Dummy repo for testing the doxygen - breathe - readthedocs build process.
☆11Jun 17, 2022Updated 4 years ago
at-tan / Cracking_Ames_Housing_OLS
View on GitHub
Linear regression modelling of the Ames housing dataset, with the goal of predicting the house sale price, as published in Towards Data S…
☆10Oct 30, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
PacktPublishing / Mobile-Artificial-Intelligence-Projects
View on GitHub
Mobile Artificial Intelligence Projects, published by Packt
☆11Jan 30, 2023Updated 3 years ago
intel / hdk
View on GitHub
A low-level execution library for analytic data processing.
☆32May 9, 2024Updated 2 years ago
devmindset / sparkscalainterview
View on GitHub
Contain Interview Questions Solutions
☆12May 18, 2018Updated 8 years ago
myfjdthink / flink-playground
View on GitHub
使用 docker 体验 flink sql + pyflink，帮助大家更深入理解 flink
☆12Aug 28, 2023Updated 2 years ago
tupol / spark-utils
View on GitHub
Basic framework utilities to quickly start writing production ready Apache Spark applications
☆36Dec 15, 2024Updated last year
omribahumi / python_tornado_thrift
View on GitHub
Using Python Tornado to serve Thrift HTTP requests
☆13Dec 22, 2012Updated 13 years ago
PacktPublishing / Interactive-Chatbots-with-TensorFlow-
View on GitHub
Code Repository for Interactive Chatbots with TensorFlow[V], published by Packt
☆20Jan 18, 2021Updated 5 years ago
ziyanfeng / udacity-data-wrangling-mongodb
View on GitHub
Assignments and Projects for Udacity's Data Wrangling with MongoDB course
☆16Oct 17, 2016Updated 9 years ago
scalingpythonml / scaling-python-with-dask
View on GitHub
A work-in-progress book on Dask
☆12Jul 15, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
santoshjoshi / Apache-Kafka
View on GitHub
Apache Kafka Overview
☆12Jun 9, 2023Updated 3 years ago
jonas-eberle / geoportal
View on GitHub
☆11Mar 31, 2021Updated 5 years ago
binwangwork / phdmacro
View on GitHub
This is the course papg of PhD level advanced macroeconomics.
☆10Sep 13, 2021Updated 4 years ago
vaexio / vaex
View on GitHub
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…
☆8,510Apr 1, 2026Updated 3 months ago
scravy / pysparkextra
View on GitHub
☆10Jun 29, 2021Updated 5 years ago
stac-extensions / card4l
View on GitHub
Describes how to comply to the CEOS CARD4L specifications (SAR and Optical) with STAC
☆12Oct 17, 2023Updated 2 years ago
joshuaulrich / xtsExtra
View on GitHub
Supplementary xts functionality, and development platform for GSoC projects
☆14Feb 9, 2015Updated 11 years ago
ned14 / mcpp
View on GitHub
A C99 conforming preprocessor
☆26Aug 27, 2020Updated 5 years ago
jbryer / irutils
View on GitHub
An R package containing utilities for institutional researchers. This package is also used to support the Introduction to R and LaTeX doc…
☆15Mar 13, 2019Updated 7 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
tyleransom / mostly-harmless-replication
View on GitHub
Replication of tables and figures from "Mostly Harmless Econometrics" in Stata, R, Python and Julia.
☆14Mar 21, 2019Updated 7 years ago
data-apis / array-api
View on GitHub
RFC document, tooling and other content related to the array API standard
☆274Updated this week
awslabs / ai-powered-health-data-masking
View on GitHub
A solution enabling customers to quickly deploy an architecture to identify and mask sensitive health data
☆26Jul 6, 2023Updated 3 years ago
auto-d1dact / spx_options_backtesting
View on GitHub
Python Scripts for Backtesting SPX Put Strategies Using Black-Scholes Proxies
☆14Feb 15, 2018Updated 8 years ago
patrickbrus / TransferLearning_and_CMAP
View on GitHub
This repository includes two jupyter notebooks. The first one retrains the already pre-trained ResNet-50 using transfer learning in order…
☆10Jul 23, 2020Updated 6 years ago
snap-contrib / snap-conda
View on GitHub
SNAP as a conda package
☆13Jun 10, 2021Updated 5 years ago
rainy1998 / latex
View on GitHub
☆14Dec 10, 2019Updated 6 years ago
cylondata / cylon
View on GitHub
Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.
☆303May 26, 2026Updated 2 months ago
cal-data-eng / sp21
View on GitHub
Data Engineering Course Website
☆14Apr 2, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
vgreg / python-se
View on GitHub
Examples for computing regression standard errors in Python with statsmodels
☆14Feb 1, 2024Updated 2 years ago
expertcompsci / embeddedCypher
View on GitHub
A portable, small, in memory and persistent graph database management system that implements the openCypher query language.
☆21Oct 10, 2019Updated 6 years ago
spitis / deepnorms
View on GitHub
Code for An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality (ICLR 2020)
☆11Mar 24, 2023Updated 3 years ago
suhailrehman / fuzzydata
View on GitHub
Fuzzy Data Benchmark
☆18Feb 8, 2024Updated 2 years ago
temple-geography / GUS-5073-Geovisualization
View on GitHub
Instructor: Xiaojiang Li
☆17Sep 28, 2023Updated 2 years ago
mayhewsw / multilingual-t5
View on GitHub
☆12Dec 30, 2020Updated 5 years ago
sergey-serebryakov / nns
View on GitHub
Estimating neural network runtime characteristics
☆12Mar 25, 2023Updated 3 years ago