intuit / thrive
Thrive is an ETL framework that runs single-row transformations on HDFS data and makes the data available in relational databases (Hive and Vertica).
☆10Updated 7 years ago
Alternatives and similar repositories for thrive:
Users that are interested in thrive are comparing it to the libraries listed below
- Standard evaluations for binary classifiers so you don't have to☆315Updated 6 years ago
- ☆18Updated 7 years ago
- Python SDK for accessing Qubole Data Service☆52Updated 3 weeks ago
- This repository contains code files specifically IPython notebooks for the assignments in the course "Introduction to Big Data with Apach…☆115Updated 7 months ago
- A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support☆261Updated 7 years ago
- ☆263Updated 5 years ago
- Feature engineering and machine learning: together at last!☆24Updated 4 years ago
- A repository with different graph processing tehnologies☆11Updated 9 years ago
- Content for architecting a data science platform for products using Luigi, Spark & Flask.☆163Updated 5 years ago
- Distributed decision tree ensemble learning in Scala☆392Updated 6 years ago
- Observations from Ian on successfully delivering data science products☆543Updated 3 years ago
- A collection of essays on career advice.☆157Updated 3 years ago
- "The path to execution", Styx is a service that schedules batch data processing jobs in Docker containers on Kubernetes.☆267Updated last year
- Docker Image and Kubernetes Configurations for Spark 2.x☆41Updated 5 years ago
- BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data its…☆929Updated last year
- R Code + R Notebook for analyzing millions of Amazon reviews using Apache Spark☆83Updated 8 years ago
- Some thoughts on how to use machine learning in production☆72Updated 7 years ago
- MacroBase: A Search Engine for Fast Data☆666Updated 2 years ago
- Secondary indexing for structured and unstructured data in Big Table style databases.☆44Updated 5 years ago
- REST web service for scoring PMML models☆50Updated 11 years ago
- The spacetime framework for simulations☆20Updated 4 years ago
- A cookbook for installing and configuring Apache Spark☆11Updated 6 years ago
- Generates more or less realistic log data for testing simple aggregation queries.☆257Updated last year
- Amazon Elastic MapReduce code samples☆63Updated 9 years ago
- Bayesian inference in Scala.☆435Updated last year
- Model assisted random sampling.☆120Updated 4 years ago
- A collection of data science examples implemented across a variety of languages and libraries.☆33Updated 9 years ago
- This repository contains materials for demos, tutorials, and talks by Dato Inc.☆172Updated 8 years ago
- A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself. New implementa…☆890Updated 9 years ago
- A Java Toolbox for Scalable Probabilistic Machine Learning☆119Updated last year