spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。
☆18Jul 19, 2023Updated 2 years ago
Alternatives and similar repositories for wow-spark
Users that are interested in wow-spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Spark stream from kafka(json) to s3(parquet)☆15Nov 8, 2018Updated 7 years ago
- Delta Lake Examples☆11Apr 24, 2020Updated 5 years ago
- 常用大数据工具学习实战,包含Hadoop、HBase、Kafka、ClickHouse、Hive、Redis、Zookeeper...☆22Oct 5, 2022Updated 3 years ago
- Big data smart alarm by sql☆12May 11, 2021Updated 4 years ago
- kaggle情感分析rnn+attention解法☆12Nov 17, 2017Updated 8 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Mar 23, 2016Updated 10 years ago
- low-level helpers for Apache Spark libraries and tests☆16Dec 29, 2018Updated 7 years ago
- An easy-to-use, scalable spark streaming ETL tool and sdk☆13Aug 14, 2017Updated 8 years ago
- Simulation of job offers and CVs with real-time processing, classification, and analytics using Kafka, Ray, Spark, and Databricks. Includ…☆14Dec 25, 2024Updated last year
- Ansible scripts for deploying Kafka on EC2☆10Oct 7, 2016Updated 9 years ago
- Scala练习项目:包括scala基础知识,Spark RDD,DataFrame,Spark SQL,spark与HDFS、Phoenix、Hbase交互。☆11Nov 11, 2022Updated 3 years ago
- Helpers for dealing with python.subprocess.Popen and paramiko.☆18Mar 16, 2026Updated last week
- 1.Spark离线批处理,用户实时点击统计;2.SparkSQL日志内容分析;3.受众电影分析 =>(Kafka + SparkStreaming + Redis)和(Kafka + SparkStreaming + Mysql)☆29Jun 21, 2022Updated 3 years ago
- <数据化运营>图书代码☆33Feb 18, 2018Updated 8 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆20Apr 27, 2012Updated 13 years ago
- A library based on delta for Spark and MLSQL☆60Dec 24, 2020Updated 5 years ago
- CDH6.3.2离线安装☆11Nov 2, 2020Updated 5 years ago
- ☆12Mar 15, 2022Updated 4 years ago
- 基于Keras实现seq2seq,进行英文到中文的翻译☆17Nov 17, 2020Updated 5 years ago
- NICTA Named Entity Recogniser is a rule based Named Entity Recogniser which extracts named entities from text such as Organisation, Locat…☆16Apr 15, 2023Updated 2 years ago
- Leveraging Hortonworks' HDP 3.1.0 and HDF 3.4.0 components, this tutorial guides the user through steps to stream data from a REST API in…☆19Aug 16, 2019Updated 6 years ago
- kafka + structured streaming + phoenix + elasticsearch 基于行为日志实现热门推荐,用户偏好推荐,召回融合策略实现。☆19Sep 5, 2023Updated 2 years ago
- 基于SparkML2.0进行的Kaggle、JData等比赛☆42Dec 14, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A python module to parse cypher query string and generate AST.☆16Feb 12, 2026Updated last month
- sharding jdbc 基于java代码的配置demo☆11Jun 21, 2022Updated 3 years ago
- Algorithms and Data Structures implemented in Java☆12Jul 28, 2019Updated 6 years ago
- Spark1.6和spark2.2的示例,包含kafka,flume,structuredstreaming,jedis,elasticsearch,mysql,dataframe☆15Jan 28, 2018Updated 8 years ago
- Data Quality Monitoring Tool☆15Dec 5, 2017Updated 8 years ago
- Apache Flink 学习的Demo☆10Jun 21, 2017Updated 8 years ago
- Implementation of EM/MV metrics based on N. Goix et al.☆10Sep 20, 2024Updated last year
- QPS流量控制starter☆29May 8, 2023Updated 2 years ago
- Code repository for Learning Apache Spark 2, published by Packt☆21Jan 30, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- This project describes how to write full ETL data pipeline using spark.☆15Oct 15, 2022Updated 3 years ago
- 基于阿里云的DTS进行的二次开发,基于SpringBoot框架进行消息格式转化、集成客户端kakfa☆24May 21, 2021Updated 4 years ago
- 这是一个由LangGraph协议主导的因果分析Muti-Agent,结合MCP,RAG等多种工具进行辅助进行因果分析,提供给用户一份完善的因果分析的分析报告和因果图☆34Mar 21, 2026Updated last week
- Custom datasource about spark structure streaming☆12Jan 29, 2019Updated 7 years ago
- Kafka Connect connector for receiving data and writing data to Splunk.☆25Nov 7, 2017Updated 8 years ago
- Sise supplicant exploit kit -- 华软蝴蝶漏洞利用工具包..☆14Mar 11, 2016Updated 10 years ago
- 请求spark rest API获取applications,jobs,stages,executors,rdds,streaming,environment等信息提供监控和报警服务☆11Nov 22, 2018Updated 7 years ago