enahwe / Csv2HiveLinks
Csv2Hive is an useful CSV schema finder for the Big Data. It discovers automatically schemas in big CSV files, generates the 'CREATE TABLE' statements and creates Hive tables. You don't need to writes any schemas at all. Csv2Hive is a really fast solution for integrating the whole CSV files into your DataLake.
☆27Updated 8 years ago
Alternatives and similar repositories for Csv2Hive
Users that are interested in Csv2Hive are comparing it to the libraries listed below
Sorting:
- Code reference from my Qbox blog posts.☆87Updated 10 years ago
- PySpark for Elastic Search☆55Updated 8 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated 2 years ago
- Coding exercises for Apache Spark☆104Updated 10 years ago
- A platform for real-time streaming search☆102Updated 9 years ago
- ☆146Updated 9 years ago
- Training materials for Strata, AMP Camp, etc☆148Updated 10 years ago
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52Updated 9 years ago
- An Apache Spark-shell backend for IPython☆105Updated 4 years ago
- An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parse…☆90Updated 10 years ago
- Visualize streaming machine learning in Spark☆177Updated 8 years ago
- PredictionIO Python SDK☆196Updated 7 years ago
- Beyond Piwik Analytics with Scala and Apache Spark☆46Updated 11 years ago
- Elasticsearch entity resolution plugin based on Duke☆209Updated 5 years ago
- This project combines Apache Spark and Elasticsearch to enable mining & prediction for Elasticsearch.☆212Updated 11 years ago
- Real-time Machine Learning with Apache Spark on Twitter Public Stream☆68Updated 8 years ago
- Word2Vec models with Twitter data using Spark. Blog:☆66Updated 6 years ago
- An extension of the kafka-python package that adds features like multiprocess consumers.☆39Updated 2 years ago
- This project provides association rule mining for Apache Spark. The algorithms are based on the work of Philippe Fournier-Viger and comp…☆31Updated 10 years ago
- Hadoop, Spark and Storm based anomaly detection implementations for data quality, cyber security, fraud detection etc.☆128Updated last year
- Gallery of Apache Zeppelin notebooks☆216Updated 6 years ago
- Load a CSV (or TSV) file into an Elasticsearch instance☆62Updated 3 years ago
- Send summary messages of your Luigi jobs to Slack☆46Updated 6 years ago
- Flask app to push/pull on Kafka over HTTP☆41Updated 10 years ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Updated 6 years ago
- PySpark Cassandra brings back the fun in working with Cassandra data in PySpark.☆79Updated 8 years ago
- Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark☆146Updated 9 years ago
- Elastic Search on Spark☆112Updated 11 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆20Updated 9 years ago
- Zeppelin notebook examples☆25Updated 9 years ago