enahwe / Csv2HiveLinks
Csv2Hive is an useful CSV schema finder for the Big Data. It discovers automatically schemas in big CSV files, generates the 'CREATE TABLE' statements and creates Hive tables. You don't need to writes any schemas at all. Csv2Hive is a really fast solution for integrating the whole CSV files into your DataLake.
☆27Updated 7 years ago
Alternatives and similar repositories for Csv2Hive
Users that are interested in Csv2Hive are comparing it to the libraries listed below
Sorting:
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52Updated 9 years ago
- An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parse…☆90Updated 9 years ago
- ☆146Updated 9 years ago
- PySpark for Elastic Search☆55Updated 8 years ago
- An Apache Spark-shell backend for IPython☆105Updated 4 years ago
- Oracle Data Science Bootcamp 2014☆25Updated 10 years ago
- Code reference from my Qbox blog posts.☆87Updated 10 years ago
- Coding exercises for Apache Spark☆104Updated 10 years ago
- Training materials for Strata, AMP Camp, etc☆149Updated 9 years ago
- Visualize streaming machine learning in Spark☆177Updated 8 years ago
- A simple tool for plotting Spark ML's Decision Trees☆40Updated 3 years ago
- Streaming tweets with spark, language detection & sentiment analysis, dashboard with Kibana☆103Updated 9 years ago
- Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark☆147Updated 9 years ago
- Learn the pyspark API through pictures and simple examples☆170Updated 4 years ago
- PySpark Cassandra brings back the fun in working with Cassandra data in PySpark.☆79Updated 8 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated last year
- Live-updating Spark UI built with Meteor☆189Updated 4 years ago
- Anomaly Detection model uses Spark for training and Spark Streaming for testing☆67Updated 9 years ago
- Elastic Search on Spark☆112Updated 10 years ago
- Zeppelin notebook examples☆25Updated 9 years ago
- Elasticsearch entity resolution plugin based on Duke☆209Updated 5 years ago
- This project combines Apache Spark and Elasticsearch to enable mining & prediction for Elasticsearch.☆211Updated 10 years ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Updated 6 years ago
- Gallery of Apache Zeppelin notebooks☆216Updated 6 years ago
- Sparkling Pandas☆364Updated 2 years ago
- A short guide for transitioning from Python to Scala☆65Updated 9 years ago
- PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)☆94Updated 3 years ago
- ☆110Updated 8 years ago
- Vagrant project to spin up a single virtual machine running current versions of Hadoop, Hive and Spark☆74Updated 6 years ago
- A Spark WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR☆118Updated 9 years ago