ziky90 / tf-idf-Hadoop-MapReduceView external linksLinks
Project from the CTU Big Data course which purpose was to compute tf-idf values for the czech wikipedia
☆10Jul 8, 2014Updated 11 years ago
Alternatives and similar repositories for tf-idf-Hadoop-MapReduce
Users that are interested in tf-idf-Hadoop-MapReduce are comparing it to the libraries listed below
Sorting:
- Show ML predictions w/ Streamlit☆12Apr 2, 2024Updated last year
- This is end-to-end-recommender-system repo has full recommender system implementation from collecting data, modeling and deploying machin…☆13May 19, 2021Updated 4 years ago
- Spark Notebook docker image☆10Dec 29, 2017Updated 8 years ago
- This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project de…☆11Nov 18, 2023Updated 2 years ago
- ☆12Oct 2, 2020Updated 5 years ago
- Code for the Java Design Patterns Video Tutorial☆15Mar 18, 2018Updated 7 years ago
- ☆15Jun 22, 2020Updated 5 years ago
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆19May 29, 2023Updated 2 years ago
- End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API…☆20Jul 26, 2024Updated last year
- Data Engineering Project in GCP☆22Mar 29, 2023Updated 2 years ago
- Data warehouse implementation for an e-commerce website “Infibeam” that sells digital and consumer electronics.☆21Jan 28, 2018Updated 8 years ago
- A python telegram bot that scrapes content from a web and displays it to the users.☆17Oct 18, 2019Updated 6 years ago
- AmbSQL is a DBMS that is most EASY to operate on.☆25Jul 15, 2021Updated 4 years ago
- This is a recipe for docker container based architecture based on airflow, kafka,spark,docker☆20Oct 15, 2024Updated last year
- My current portfolio☆25Apr 3, 2024Updated last year
- Data Structures in Java☆24Feb 10, 2022Updated 4 years ago
- Implementation and Benchmark Splits to study Out-of-Distribution Generalization in Deep Metric Learning.☆25Oct 2, 2021Updated 4 years ago
- Infrastructure for starting TG bot project. Postgres, Minio, Grafana, Alembic☆22Jul 15, 2022Updated 3 years ago
- Data Engineering Bootcamp☆30Aug 5, 2025Updated 6 months ago
- SpringBoot Mongo Rest API tutorial☆28Oct 16, 2018Updated 7 years ago
- This repo is for the Linkedin Learning course: End-to-End Data Engineering Project☆28Nov 9, 2023Updated 2 years ago
- A demonstration of an ELT (Extract, Load, Transform) pipeline☆31Feb 19, 2024Updated last year
- Docker Big Data Tools: This docker-compose file is configured to run multiple nodes. This is a Hadoop Cluster that contains the necessary…☆31Jul 6, 2021Updated 4 years ago
- Students Performance Evaluation using Feature Engineering, Feature Extraction, Manipulation of Data, Data Analysis, Data Visualization an…☆33Jun 11, 2020Updated 5 years ago
- used Airflow, Postgres, Kafka, Spark, and Cassandra, and GitHub Actions to establish an end-to-end data pipeline☆29Oct 25, 2023Updated 2 years ago
- ViewPager that displays items from right to left for RTL locales and behaves like a regular ViewPager otherwise☆33Jan 3, 2016Updated 10 years ago
- ☆32Mar 7, 2018Updated 7 years ago
- GitHub Action for the Community, from welcoming first timers to badges☆36May 10, 2024Updated last year
- This project shows how to capture changes from postgres database and stream them into kafka☆41May 17, 2024Updated last year
- Markdown auto-formatting, beautification, and cleanup for Atom☆45Mar 4, 2023Updated 2 years ago
- RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).☆35Jul 16, 2022Updated 3 years ago
- ☆47Feb 23, 2021Updated 4 years ago
- This project introduces PySpark, a powerful open-source framework for distributed data processing. We explore its architecture, component…☆37Sep 26, 2024Updated last year
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆45Dec 11, 2023Updated 2 years ago
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆48Mar 14, 2024Updated last year
- Big Data webapp using Chicago street congestion, crashes, red light violations, and speed camera violations☆44Jan 9, 2021Updated 5 years ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆47Dec 4, 2023Updated 2 years ago
- This repository contains the code for our CVPR 2022 paper on "Integrating Language Guidance into Vision-based Deep Metric Learning".☆44Aug 9, 2022Updated 3 years ago
- The list of countries stored in different file formats. The data include country names in both English and Arabic languages.☆66Jan 4, 2023Updated 3 years ago