prompt-spark / stackexchange-spark-scala-analyser
Still in Beta
☆17Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for stackexchange-spark-scala-analyser
- Filter lines from standard input according to some probability, with a given delay, and for a certain duration.☆24Updated last year
- Extract, PreProcess, and Analyze big data on GPUs☆21Updated 6 years ago
- pysh-db - The Data Science Toolkit (DSK)☆14Updated 5 years ago
- This repository auto-configures an Apache Pinot and Superset cluster for analyzing IRA tweets from FiveThirtyEight.☆11Updated 4 years ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀☆27Updated 2 years ago
- Using Luigi to create a Machine Learning Pipeline using the Rossman Sales data from Kaggle☆33Updated 8 years ago
- bamboolib - template for creating your own binder notebook☆21Updated 2 years ago
- Build a data catalog by running a single line of code☆16Updated 2 months ago
- 📝 A blog post about report generation and automation in python☆40Updated 5 years ago
- Public Neo4j Knowledge Base☆21Updated last month
- Python bindings for Matroid API☆16Updated last month
- Repo demonstrating a Dagster pipeline to generate Neo4j Graph☆21Updated 3 years ago
- Performs unique entity estimation corresponding to Chen, Shrivastava, Steorts (2018).☆14Updated 5 years ago
- ☆13Updated 5 years ago
- ☆13Updated last year
- Analytics on Apache Projects for Diversity☆18Updated 5 years ago
- ☆10Updated 3 years ago
- Awesome list of dataops products, open source and resources☆24Updated 2 years ago
- Snippets of code used in blog posts and other media.☆13Updated last year
- ☆14Updated 4 years ago
- Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb☆18Updated last year
- An easy to use tool to generate fake/dummy data in bulk and export it as JSON, CSV, Avro or directly into your database as tables. Writte…☆9Updated 5 years ago
- Blog post on ETL pipelines with Airflow☆23Updated 4 years ago
- Events about the open source data stack☆13Updated 2 years ago
- Server that simplifies connecting pandas to a realtime data feed, testing hypothesis and visualizing results in a web browser☆33Updated last year
- datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest…☆58Updated 3 years ago
- Repository of Notebooks taken from https://neo4j.com/graph-algorithms-book/☆26Updated 4 years ago
- Examples of vector DB indexing and query with various vector databases.☆12Updated last month
- matching between unstructured and structured data sets☆14Updated 6 years ago