Projects:
- Apache Spark - A unified analytics engine for large-scale data processing☆39,296Updated this week
- Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce,…☆27,132Updated 5 months ago
- Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.☆25,928Updated this week
- Learn and understand Docker&Container technologies, with real DevOps practice!☆24,685Updated last month
- Free Data Engineering course!☆24,521Updated 2 weeks ago
- 大数据入门指南☆15,733Updated 8 months ago
- GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.☆15,149Updated this week
- flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Ta…☆14,483Updated 3 months ago
- List of Data Science Cheatsheets to rule the world☆14,447Updated 2 months ago
- Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.☆14,158Updated 2 weeks ago
- 【大厂面试专栏】一份Java程序员需要的技术指南,这里有面试题、系统架构、职场锦囊、主流中间件等,让你成为更牛的自己!☆14,111Updated 10 months ago
- Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and …☆13,605Updated this week
- Apache Doris is an easy-to-use, high performance and unified analytics database.☆12,315Updated this week
- 专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...☆9,633Updated last year
- 🧙 Build, run, and manage data pipelines for integrating and transforming data.☆7,722Updated this week
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…☆7,440Updated this week
- H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random F…☆6,861Updated this week
- Alluxio, data orchestration for analytics and machine learning in the cloud☆6,806Updated this week
- A Flexible and Powerful Parameter Server for large-scale machine learning☆6,725Updated 8 months ago
- Python SQL Parser and Transpiler☆6,395Updated this week
- Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.☆6,374Updated this week
- macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime …☆6,114Updated last year
- Simple and Distributed Machine Learning☆5,043Updated this week
- PipelineAI☆4,165Updated 5 months ago
- TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.☆3,871Updated last year
- State of the Art Natural Language Processing☆3,808Updated this week
- The Hunting ELK☆3,742Updated 3 months ago
- A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Apache Pinot, Tablesaw, and many others☆3,497Updated last week
- cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,支持sso登录,多租户,大数据平台对接,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU,边缘计算,serverless,标注平台,自动化标注…☆3,471Updated 3 weeks ago
- 酷玩 Spark: Spark 源代码解析、Spark 类库等☆3,463Updated 2 years ago
- 🔨 用 JSON 来生成结构化的 SQL 语句,基于 Vue3 + TypeScript + Vite + Ant Design + MonacoEditor 实现,项目简单(重逻辑轻页面)、适合练手~☆3,405Updated 8 months ago
- Koalas: pandas API on Apache Spark☆3,329Updated 5 months ago
- Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications…☆3,289Updated this week
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,252Updated last week
- Interactive and Reactive Data Science using Scala and Spark.☆3,150Updated last year
- DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitizati…☆3,045Updated 2 months ago
- REST job server for Apache Spark☆2,844Updated 2 months ago
- Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.☆2,744Updated this week
- Python clone of Spark, a MapReduce alike framework in Python☆2,691Updated 3 years ago
- 大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料☆2,650Updated this week