sirrice / dbtruckLinks
just put my data in a database!
☆39Updated 9 years ago
Alternatives and similar repositories for dbtruck
Users that are interested in dbtruck are comparing it to the libraries listed below
Sorting:
- ☆92Updated 9 years ago
- Functional, Typesafe, Declarative Data Pipelines☆139Updated 7 years ago
- An open-source, vendor-neutral data context service.☆160Updated 7 years ago
- Analyze the structure and dynamics of an open source project's developer community, using graph algorithms, etc.☆58Updated 4 years ago
- A prototype of Hive UDFs/UDTFs that execute nested SQL queries within rows.☆54Updated 10 years ago
- Live-updating Spark UI built with Meteor☆189Updated 4 years ago
- ☆110Updated 8 years ago
- Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data Engine☆167Updated 4 years ago
- A Cascading Workflow Visualizer☆83Updated 2 years ago
- Looking at big data? Add a little salt.☆59Updated 2 years ago
- Generates more or less realistic log data for testing simple aggregation queries.☆260Updated last year
- Complete Pipeline Training at Big Data Scala By the Bay☆71Updated 9 years ago
- Create Parquet files from CSV☆69Updated 8 years ago
- A platform for real-time streaming search☆102Updated 9 years ago
- Simplifying robust end-to-end machine learning on Apache Spark.☆474Updated 8 years ago
- Apache Spark AWS Lambda Executor (SAMBA)☆44Updated 7 years ago
- Pig on Apache Spark☆82Updated 10 years ago
- Standard evaluations for binary classifiers so you don't have to☆315Updated 6 years ago
- SociaLite: query language for large-scale graph analysis and data mining☆110Updated 9 years ago
- An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parse…☆90Updated 9 years ago
- BigTable, Document and Graph Database with Full Text Search☆186Updated 7 years ago
- Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets☆93Updated 9 years ago
- Distributed decision tree ensemble learning in Scala☆390Updated 6 years ago
- Scheduled task execution on top of AWS Data Pipeline☆43Updated 10 years ago
- Utils around luigi.☆66Updated last month
- Serving system for batch generated data sets☆177Updated 8 years ago
- A Python library for creating fast, repeatable and self-documenting data analysis pipelines.☆241Updated 2 weeks ago
- Implementation of "A Parallel Spatial Co-location Mining Algorithm Based on MapReduce" paper☆49Updated 7 years ago
- Distributed Streaming Quantiles (for PySpark)☆38Updated 11 years ago
- Google BigQuery support for Spark, SQL, and DataFrames☆155Updated 5 years ago