enahwe / Csv2HiveLinks
Csv2Hive is an useful CSV schema finder for the Big Data. It discovers automatically schemas in big CSV files, generates the 'CREATE TABLE' statements and creates Hive tables. You don't need to writes any schemas at all. Csv2Hive is a really fast solution for integrating the whole CSV files into your DataLake.
☆27Updated 7 years ago
Alternatives and similar repositories for Csv2Hive
Users that are interested in Csv2Hive are comparing it to the libraries listed below
Sorting:
- PySpark for Elastic Search☆55Updated 8 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆41Updated 7 years ago
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52Updated 9 years ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Updated 6 years ago
- Oracle Data Science Bootcamp 2014☆25Updated 10 years ago
- ☆41Updated 7 years ago
- This project is for examples of how to use Zeppelin. https://github.com/apache/incubator-zeppelin☆25Updated 9 years ago
- Building blocks and patterns for building data prep transformations and feature engineering in Spark.☆16Updated 9 years ago
- Python client for Spark Jobserver Rest API☆39Updated 5 years ago
- Training materials for Strata, AMP Camp, etc☆149Updated 9 years ago
- A Spark WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR☆118Updated 9 years ago
- Example of use of Spark Streaming with Kafka☆90Updated 10 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parse…☆90Updated 9 years ago
- ☆146Updated 9 years ago
- an example of integrating Spark Streaming with Google Pub/Sub and Google Datastore☆17Updated 8 years ago
- Structured Streaming Machine Learning example with Spark 2.0☆92Updated 8 years ago
- A Spark Streaming job reading events from Amazon Kinesis and writing event counts to DynamoDB☆94Updated 4 years ago
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Updated 6 years ago
- CustomerML is an open source customer science platform leveraging the power of Predictiveworks and fully integrated with Elasticsearch an…☆48Updated 10 years ago
- Anomaly Detection model uses Spark for training and Spark Streaming for testing☆67Updated 9 years ago
- Zeppelin notebook examples☆25Updated 9 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated last year
- HDP Data Science/Machine Learning demo☆37Updated 9 years ago
- Load a CSV (or TSV) file into an Elasticsearch instance☆62Updated 2 years ago
- functionstest☆33Updated 8 years ago
- An extension of the kafka-python package that adds features like multiprocess consumers.☆39Updated last year
- Docker compose files for various kafka stacks☆32Updated 7 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆20Updated 8 years ago
- Based off the design of SparkOnHBase. This Repo will support Spark, Spark Streaming, and Spark SQL integration with Kudu.☆50Updated 9 years ago