enahwe / Csv2Hive
Csv2Hive is an useful CSV schema finder for the Big Data. It discovers automatically schemas in big CSV files, generates the 'CREATE TABLE' statements and creates Hive tables. You don't need to writes any schemas at all. Csv2Hive is a really fast solution for integrating the whole CSV files into your DataLake.
☆27Updated 7 years ago
Alternatives and similar repositories for Csv2Hive:
Users that are interested in Csv2Hive are comparing it to the libraries listed below
- PySpark for Elastic Search☆55Updated 8 years ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Updated 6 years ago
- An extension of the kafka-python package that adds features like multiprocess consumers.☆39Updated last year
- Training materials for Strata, AMP Camp, etc☆149Updated 9 years ago
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52Updated 8 years ago
- ☆49Updated 5 years ago
- A plugin for Apache Airflow that allows you to manage the users that can login☆14Updated 5 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆41Updated 7 years ago
- Zeppelin notebook examples☆26Updated 9 years ago
- This project is for examples of how to use Zeppelin. https://github.com/apache/incubator-zeppelin☆25Updated 9 years ago
- Beyond Piwik Analytics with Scala and Apache Spark☆46Updated 10 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated last year
- Data and code for "Fast Data Applications with Spark and Python"☆25Updated 8 years ago
- Demonstration workflows for hadoop batch jobs☆8Updated 9 years ago
- An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parse…☆90Updated 9 years ago
- Coding exercises for Apache Spark☆104Updated 9 years ago
- A Real-Time Analytical Processing (RTAP) example using Spark/Shark☆51Updated 11 years ago
- A curated list of all the awesome examples, articles, tutorials and videos for Apache Airflow.☆96Updated 4 years ago
- REST-like API exposing Airflow data and operations☆61Updated 6 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆20Updated 8 years ago
- A real time streaming implementation of markov chain based fraud detection☆23Updated 10 years ago
- Complete Pipeline Training at Big Data Scala By the Bay☆71Updated 9 years ago
- ☆146Updated 9 years ago
- Anomaly Detection model uses Spark for training and Spark Streaming for testing☆67Updated 9 years ago
- Simple Spark example of generating table stats for use of data quality checks☆28Updated 8 years ago
- An Apache Spark-shell backend for IPython☆105Updated 3 years ago
- Load a CSV (or TSV) file into an Elasticsearch instance☆61Updated 2 years ago
- functionstest☆33Updated 8 years ago
- A Spark Streaming job reading events from Amazon Kinesis and writing event counts to DynamoDB☆94Updated 4 years ago