enahwe / Csv2Hive
Csv2Hive is an useful CSV schema finder for the Big Data. It discovers automatically schemas in big CSV files, generates the 'CREATE TABLE' statements and creates Hive tables. You don't need to writes any schemas at all. Csv2Hive is a really fast solution for integrating the whole CSV files into your DataLake.
☆27Updated 7 years ago
Alternatives and similar repositories for Csv2Hive:
Users that are interested in Csv2Hive are comparing it to the libraries listed below
- PySpark for Elastic Search☆55Updated 8 years ago
- Oracle Data Science Bootcamp 2014☆25Updated 9 years ago
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52Updated 8 years ago
- Beyond Piwik Analytics with Scala and Apache Spark☆46Updated 10 years ago
- An extension of the kafka-python package that adds features like multiprocess consumers.☆39Updated last year
- A Real-Time Analytical Processing (RTAP) example using Spark/Shark☆51Updated 11 years ago
- A real time streaming implementation of markov chain based fraud detection☆23Updated 10 years ago
- ☆146Updated 9 years ago
- Coding exercises for Apache Spark☆104Updated 9 years ago
- This project is for examples of how to use Zeppelin. https://github.com/apache/incubator-zeppelin☆25Updated 9 years ago
- Example of use of Spark Streaming with Kafka☆90Updated 10 years ago
- An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parse…☆90Updated 9 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆41Updated 7 years ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Updated 5 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- Elastic Search on Spark☆112Updated 10 years ago
- Zeppelin notebook examples☆26Updated 9 years ago
- Training materials for Strata, AMP Camp, etc☆149Updated 9 years ago
- A platform for real-time streaming search☆103Updated 9 years ago
- Additional useful algorithms that can be used with spark.☆24Updated 10 years ago
- Interactive Audience Analytics with Spark and HyperLogLog☆55Updated 9 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated last year
- NRT Sessionization with Spark Streaming landing on HDFS and putting live stats in HBase☆51Updated 10 years ago
- Reference Architectures for Apache Spark☆38Updated 8 years ago
- Coursera Machine Learning class examples in Spark☆43Updated 11 years ago
- Examples for Fast Data Processing with Spark☆59Updated 11 years ago
- Python Client for WebHDFS REST API☆43Updated 9 years ago
- Starter project for building MemSQL Streamliner Pipelines☆32Updated 7 years ago
- Supporting content (slides and exercises) for the Addison-Wesley (Pearson) video series covering best practices for developing scalable S…☆66Updated 9 years ago
- An example of using Avro and Parquet in Spark SQL☆60Updated 9 years ago