cartershanklin / csv-to-orc
Convert a CSV fle to ORCFile
☆26Updated 6 years ago
Alternatives and similar repositories for csv-to-orc:
Users that are interested in csv-to-orc are comparing it to the libraries listed below
- Presto K8S Operator☆9Updated 5 years ago
- Ansible playbooks for Apache Spark on kube☆27Updated 7 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Updated 2 years ago
- Starter project for building MemSQL Streamliner Pipelines☆32Updated 8 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- Combination of Dockerized Hortonworks projects and other Hadoop ecosystem components☆11Updated 5 years ago
- UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy☆61Updated last year
- Spark cloud integration: tests, cloud committers and more☆19Updated 2 months ago
- Test your Hive scripts inside your favorite IDE with HiveQLUnit! Increase your developers productivity by testing on all operating system…☆39Updated 4 years ago
- Cloudbreak Deployer Tool☆34Updated last year
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- A High Performance Cluster Consumer for Kafka that creates Avro (boom) files in Hadoop in time based directory paths☆42Updated 8 years ago
- Example using Grafana with Druid☆11Updated 10 years ago
- Collection of HDP Tuning Tricks & Tips (unofficial guide)☆17Updated 7 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Real time and offline time series analysis with Spark, Spark Streaming and Storm☆21Updated 4 years ago
- ☆26Updated 5 years ago
- Ansible playbooks to construct distributed computing environments☆62Updated 3 years ago
- An application that records stats about consumer group offset commits and reports them as prometheus metrics☆14Updated 5 years ago
- A small project to show how to add lineage to Atlas when using Spark as ETL tool☆12Updated 8 years ago
- Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…☆71Updated 2 years ago
- Using the Parquet file format (with Avro) to process data with Apache Flink☆14Updated 9 years ago
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Updated 3 years ago
- Schema Registry integration for Apache Spark☆40Updated 2 years ago
- Cascading on Apache Flink®☆54Updated last year
- hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.☆28Updated 7 years ago
- These are some code examples☆55Updated 5 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 8 years ago
- Flink Examples☆39Updated 8 years ago
- ## Auto-archived due to inactivity. ## Simple JVM Profiler Using StatsD and Other Metrics Backends☆15Updated last year