datahuborg / datahub
An experimental hosted platform (GitHub-like) for organizing, managing, sharing, collaborating, and making sense of data.
☆210Updated 6 years ago
Related projects: ⓘ
- An open-source, vendor-neutral data context service.☆158Updated 6 years ago
- ☆92Updated 8 years ago
- MacroBase: A Search Engine for Fast Data☆660Updated last year
- Myria is a scalable Analytics-as-a-Service platform based on relational algebra.☆112Updated 2 years ago
- BlinkDB: Sub-Second Approximate Queries on Very Large Data.☆660Updated 10 years ago
- ☆388Updated this week
- A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself. New implementa…☆889Updated 8 years ago
- Simplifying robust end-to-end machine learning on Apache Spark.☆468Updated 7 years ago
- SociaLite: query language for large-scale graph analysis and data mining☆109Updated 8 years ago
- Lightweight Tableau-style interface for visual analysis, built on Vega-lite.☆367Updated 7 years ago
- SDK for Turi's GraphLab Create.☆149Updated 6 years ago
- Large scale query engine benchmark☆99Updated 8 years ago
- MLDB is the Machine Learning Database☆664Updated last year
- ☆146Updated 8 years ago
- A Machine Learning System for Data Enrichment.☆75Updated 6 years ago
- ☆334Updated this week
- BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data its…☆921Updated 10 months ago
- ☆399Updated this week
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆51Updated 7 years ago
- Mirror of Apache Samoa (Incubating)☆246Updated last year
- ☆64Updated 12 years ago
- An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parse…☆90Updated 8 years ago
- FlashX is a collection of big data analytics tools that perform data analytics in the form of graphs and matrices.☆231Updated 4 years ago
- ☆110Updated 7 years ago
- ☆46Updated 7 years ago
- Analyze the structure and dynamics of an open source project's developer community, using graph algorithms, etc.☆57Updated 3 years ago
- Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets☆91Updated 8 years ago
- Google Dataflow Runner for Apache Flink™ (deprecated; please use the up-to-date Beam Runner)☆88Updated 8 years ago
- Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data Engine☆168Updated 3 years ago
- Scalable Machine Learning in Scalding☆361Updated 6 years ago