cartershanklin / csv-to-orc
Convert a CSV fle to ORCFile
☆26Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for csv-to-orc
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 7 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Updated 2 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 3 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 8 months ago
- Schema Registry integration for Apache Spark☆39Updated last year
- Random implementation notes☆33Updated 11 years ago
- File compaction tool that runs on top of the Spark framework.☆59Updated 5 years ago
- Spark Structured Streaming State Tools☆34Updated 4 years ago
- type-class based data cleansing library for Apache Spark SQL☆79Updated 5 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- Collection of HDP Tuning Tricks & Tips (unofficial guide)☆17Updated 7 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆48Updated 10 months ago
- A Spark datasource for the HadoopOffice library☆39Updated 2 years ago
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Updated last year
- Presto K8S Operator☆9Updated 4 years ago
- Spark structured streaming with Kafka data source and writing to Cassandra☆64Updated 4 years ago
- Spark cloud integration: tests, cloud committers and more☆19Updated 8 months ago
- A small project to show how to add lineage to Atlas when using Spark as ETL tool☆12Updated 7 years ago
- Paper: A Zero-rename committer for object stores☆20Updated 3 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Updated 4 years ago
- Camus Compressor merges files created by Camus and saves them in a compressed format.☆12Updated last year
- These are some code examples☆55Updated 4 years ago
- Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub☆36Updated 6 years ago
- hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.☆28Updated 6 years ago
- Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…☆66Updated 8 months ago
- JSON schema parser for Apache Spark☆81Updated 2 years ago
- Hadoop Data Pipeline using Falcon☆15Updated 8 years ago
- Ansible playbooks for Apache Spark on kube☆27Updated 7 years ago