enahwe / Csv2HiveLinks
Csv2Hive is an useful CSV schema finder for the Big Data. It discovers automatically schemas in big CSV files, generates the 'CREATE TABLE' statements and creates Hive tables. You don't need to writes any schemas at all. Csv2Hive is a really fast solution for integrating the whole CSV files into your DataLake.
☆27Updated 7 years ago
Alternatives and similar repositories for Csv2Hive
Users that are interested in Csv2Hive are comparing it to the libraries listed below
Sorting:
- PySpark for Elastic Search☆55Updated 8 years ago
- ☆41Updated 7 years ago
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52Updated 9 years ago
- This project is for examples of how to use Zeppelin. https://github.com/apache/incubator-zeppelin☆25Updated 9 years ago
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Updated 9 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆41Updated 7 years ago
- Spark-cloud is a set of scripts for starting spark clusters on ec2☆12Updated 9 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- A Real-Time Analytical Processing (RTAP) example using Spark/Shark☆51Updated 11 years ago
- PMML evaluator library for the Apache Hive data warehouse software (legacy codebase)☆13Updated 10 years ago
- Data Pipeline Clientlib provides an interface to tail and publish to data pipeline topics.☆110Updated 2 years ago
- Zeppelin notebook examples☆26Updated 9 years ago
- A Spark WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR☆118Updated 9 years ago
- An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parse…☆90Updated 9 years ago
- A real time streaming implementation of markov chain based fraud detection☆23Updated 10 years ago
- Examples for Fast Data Processing with Spark☆59Updated 11 years ago
- Simple Spark example of generating table stats for use of data quality checks☆28Updated 8 years ago
- A plugin for Apache Airflow that allows you to manage the users that can login☆14Updated 5 years ago
- ☆146Updated 9 years ago
- Beyond Piwik Analytics with Scala and Apache Spark☆46Updated 10 years ago
- An Apache Spark-shell backend for IPython☆105Updated 3 years ago
- Interactive Audience Analytics with Spark and HyperLogLog☆55Updated 9 years ago
- Python client for Spark Jobserver Rest API☆39Updated 5 years ago
- A rough prototype of a tool for discovering Apache Hive schemas from JSON documents.☆42Updated last year
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated last year
- Experiments with Ooyala's Spark Job Server☆21Updated 10 years ago
- Simplify getting Zeppelin up and running☆56Updated 8 years ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Updated 6 years ago
- Provides a Pythonic interface for reading and writing Avro schemas☆27Updated 2 years ago
- functionstest☆33Updated 8 years ago