Quickly ingest messy CSV and XLS files. Export to clean pandas, SQL, parquet
☆196Jun 9, 2023Updated 2 years ago
Alternatives and similar repositories for d6tstack
Users that are interested in d6tstack are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fuzzy joins for python pandas - easily join different datasets☆59Aug 11, 2020Updated 5 years ago
- Plugin for Intake to read from SQL servers☆15May 29, 2023Updated 2 years ago
- Push and pull data files like code☆175Jul 20, 2023Updated 2 years ago
- python package for performing deduplication using flexible text matching and cleaning in pandas dataframe☆25Nov 30, 2020Updated 5 years ago
- ☆16Jan 20, 2019Updated 7 years ago
- Python library for building highly effective data science workflows☆947Jul 20, 2023Updated 2 years ago
- SnapLoc is a product that does automatic image classification and spatio-temporal analysis in order to recommend the places of interest i…☆15Mar 21, 2018Updated 8 years ago
- ☆12Aug 4, 2020Updated 5 years ago
- A collection of utilities and tools for teams and organizations using dbt☆15Nov 24, 2023Updated 2 years ago
- Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.☆17Jan 29, 2026Updated last month
- Dockerfile for Apache Zeppelin☆17Dec 9, 2015Updated 10 years ago
- Set-oriented Operations in Pandas☆24May 27, 2020Updated 5 years ago
- Tool to dump all GPS traces collected by/for the OpenStreetMap project.☆25Mar 6, 2019Updated 7 years ago
- Official dbt adapter for Vertica☆28Jun 13, 2025Updated 9 months ago
- Forest Management Tool a C++ library for forest planning.☆16Updated this week
- Simple tool to pull posts and users from Gab☆16Jan 8, 2026Updated 2 months ago
- Curso Introdutório de Python do grupy-sanca☆22Oct 2, 2025Updated 5 months ago
- These are the famous Uppsala Software Factory programs, rescued into GitHub.☆10Nov 3, 2022Updated 3 years ago
- From Dataset Labeling, Entity Extraction to production Knowledge Graph Deployment: The Power of NLP and LLMs Combined.☆12May 14, 2024Updated last year
- A Postgres backed STAC API.☆31Dec 22, 2022Updated 3 years ago
- Superfast betabinomial fit implemented in Cython☆15Oct 21, 2025Updated 5 months ago
- A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner☆2,641Mar 20, 2024Updated 2 years ago
- Data Vault 2.0: Code generation, Vertica, Airflow☆13Nov 20, 2019Updated 6 years ago
- sqldf for pandas☆1,349Jul 24, 2024Updated last year
- Intake is a lightweight package for finding, investigating, loading and disseminating data.☆1,071Mar 9, 2026Updated 2 weeks ago
- ERPL is a DuckDB extension to connect to API based ecosystems via standard interfaces like OData, GraphQL and REST. This works e.g. for S…☆26Mar 12, 2026Updated last week
- End to end mlflow with feast example☆17May 18, 2021Updated 4 years ago
- An Implementation of ERNIE For Language Understanding (including Pre-training models and Fine-tuning tools)☆27Jul 30, 2019Updated 6 years ago
- Handle project folder, template and file templates in JupyterLab☆15Nov 14, 2022Updated 3 years ago
- Linux kernel for SHIELD☆23Mar 12, 2015Updated 11 years ago
- Component for displaying KPI widgets on a Streamlit dashboard☆18Aug 25, 2021Updated 4 years ago
- OpenStreetMap / OpenAddresses.io geocoder written in python☆17Jul 15, 2022Updated 3 years ago
- notebookJS: seamless JavaScript integration in Python Notebooks☆177Dec 25, 2022Updated 3 years ago
- A thread synchonized queue made for PThreads☆11Jan 15, 2021Updated 5 years ago
- Matplotlib style configurator, built with Streamlit☆29Jul 8, 2020Updated 5 years ago
- Analysis pipeline for quick ML analyses.☆11Oct 4, 2018Updated 7 years ago
- An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks☆56Mar 10, 2026Updated last week
- Factor Risk Parity Portfolio Construction algorithm. Built during my Master's. final project. Backtested on the S&P500.☆11Sep 18, 2022Updated 3 years ago
- Cross-identification of radio objects and host galaxies by applying machine learning on crowdsourced training labels.☆13Jun 21, 2022Updated 3 years ago