LinkedInAttic / apache-incubator-gobblin
Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin features integrations with Apache Hadoop, Apache Kafka, Salesforce, S3, MySQL, Google etc.
☆11Updated 7 years ago
Alternatives and similar repositories for apache-incubator-gobblin:
Users that are interested in apache-incubator-gobblin are comparing it to the libraries listed below
- Temporal_Graph_library☆25Updated 6 years ago
- UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy☆62Updated last year
- Insight Engineering Platform Components☆90Updated last month
- A High Performance Cluster Consumer for Kafka that creates Avro (boom) files in Hadoop in time based directory paths☆42Updated 8 years ago
- A distributed database with a built in streaming data platform☆58Updated last month
- Examples of user defined functions for Apache Drill☆19Updated 7 years ago
- Cascading on Apache Flink®☆54Updated last year
- Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Stor…☆41Updated 2 years ago
- ☆29Updated last year
- A shim for using Cassandra as a backend for OpenTSDB. Not to be used as a general Cassandra client.☆7Updated 5 years ago
- A distributed generic query layer for Apache Kafka Interactive Queries☆26Updated 7 years ago
- An example of building kubernetes operator (Flink) using Abstract operator's framework☆26Updated 5 years ago
- Read druid segments from hadoop☆10Updated 8 years ago
- A template-based cluster provisioning system☆61Updated 2 years ago
- Feature rich service discovery on ZooKeeper☆29Updated 2 years ago
- Apache Amaterasu☆56Updated 5 years ago
- Starter project for building MemSQL Streamliner Pipelines☆32Updated 7 years ago
- Example using Grafana with Druid☆11Updated 9 years ago
- The Apache Storm implementation of the Bullet backend☆40Updated last year
- Fast and efficient batch computation engine for complex analysis and reporting of massive datasets on Hadoop☆243Updated 9 years ago
- Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…☆71Updated 2 years ago
- Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities☆26Updated last month
- diqube is a fast, distributed, in-memory column-store which enables you to analyze large amounts of read-only data easily☆18Updated 2 years ago
- Cloudbreak Deployer Tool☆34Updated last year
- A Kafka Streams process to convert __consumer_offsets to a JSON-readable topic☆13Updated 5 years ago
- LinkedIn's version of Apache Calcite☆22Updated 4 months ago
- Java client library for Pilosa☆19Updated 2 years ago
- Use cases built on SnappyData. Use cases contained here: 1. Ad Analytics 2. Streaming data ingestion from RabbitMQ.☆32Updated 2 years ago
- Spooker is a dynamic framework for processing high volume data streams via processing pipelines☆29Updated 9 years ago
- An Operator for scheduling and executing NiFi Flows as Jobs on Kubernetes☆53Updated 4 years ago