LinkedInAttic / apache-incubator-gobblin
Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin features integrations with Apache Hadoop, Apache Kafka, Salesforce, S3, MySQL, Google etc.
☆11Updated 7 years ago
Related projects ⓘ
Alternatives and complementary repositories for apache-incubator-gobblin
- UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy☆60Updated 11 months ago
- Demonstration of a Hive Input Format for Iceberg☆26Updated 3 years ago
- An example of building kubernetes operator (Flink) using Abstract operator's framework☆26Updated 5 years ago
- Mirror of Apache Tephra (Incubating)☆31Updated last year
- Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Stor…☆41Updated last year
- Temporal_Graph_library☆25Updated 5 years ago
- Spooker is a dynamic framework for processing high volume data streams via processing pipelines☆29Updated 8 years ago
- Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…☆71Updated last year
- A shim for using Cassandra as a backend for OpenTSDB. Not to be used as a general Cassandra client.☆7Updated 5 years ago
- Flink Examples☆39Updated 8 years ago
- Thoughts on things I find interesting.☆17Updated 3 years ago
- A template-based cluster provisioning system☆61Updated last year
- LinkedIn's version of Apache Calcite☆22Updated 2 weeks ago
- Llama - Low Latency Application MAster☆34Updated 2 years ago
- Connect DBVisualizer to Hortonwork HiveServer2☆9Updated 9 years ago
- A Real-Time Analytical Processing (RTAP) example using Spark/Shark☆51Updated 10 years ago
- Fast and scalable timeseries database☆25Updated 4 years ago
- Read druid segments from hadoop☆10Updated 7 years ago
- Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities☆26Updated 2 years ago
- Mirror of Apache Arrow☆32Updated last month
- Spark Connector to read and write with Pulsar☆113Updated 2 weeks ago
- A High Performance Cluster Consumer for Kafka that creates Avro (boom) files in Hadoop in time based directory paths☆42Updated 8 years ago
- The Apache Storm implementation of the Bullet backend☆40Updated last year
- A docker image for testing MemSQL + MemSQL Ops☆68Updated last year
- Quark is a data virtualization engine over analytic databases.☆99Updated 7 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- Insight Engineering Platform Components☆90Updated last week
- Common utilities for Apache Kafka☆36Updated last year
- Cascading on Apache Flink®☆54Updated 9 months ago