A complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.
☆19Oct 16, 2019Updated 6 years ago
Alternatives and similar repositories for Wikipedia-Search-Engine
Users that are interested in Wikipedia-Search-Engine are comparing it to the libraries listed below
Sorting:
- Playground for pyspark (RDDs, DStreams) and Apache Airflow. Based on the example of parsing (including incorrectly formated strings) web …☆18Feb 21, 2022Updated 4 years ago
- Big Data webapp using Chicago street congestion, crashes, red light violations, and speed camera violations☆44Jan 9, 2021Updated 5 years ago
- ☆12Jul 22, 2025Updated 7 months ago
- Project - Data Processing and Analysis in Python Course☆39Oct 10, 2018Updated 7 years ago
- Aplikasi berita di indonesia menggunakan API dari https://newsapi.org/☆10Jan 20, 2020Updated 6 years ago
- A BigQuery adapter for Harlequin, a SQL IDE for the terminal.☆10Jan 19, 2025Updated last year
- Streamlit Dashboard over Superstore Data stored in Postgres Docker container. With SQLAlchemy + Plotly Express☆13Oct 16, 2024Updated last year
- This is the LinkedIn Learning repository for Level Up: Python Data Acquisitions, Prep, & EDA.☆15Mar 4, 2025Updated last year
- This dataset contain information of hotel booking, We have performed exploratory data analysis in python to get insight from the data.☆13Apr 12, 2020Updated 5 years ago
- my zsh configuration☆13Jun 26, 2025Updated 8 months ago
- ☆15Jul 31, 2022Updated 3 years ago
- A simple Dash and Plotly dashboard to review and compare federal economic data☆13Feb 1, 2022Updated 4 years ago
- MCP server for managing and serving analysis prompt templates☆21Dec 13, 2024Updated last year
- Original Caliburn project from codeplex☆16Feb 21, 2020Updated 6 years ago
- Complete PySpark Guide for the beginners... I prepared this notebook for my students.☆19Sep 18, 2019Updated 6 years ago
- Data visualisations in Power BI☆30Nov 14, 2021Updated 4 years ago
- ☆17Feb 11, 2022Updated 4 years ago
- This repo is for the Linkedin Learning course: Testing Python Data Science Code☆20Sep 26, 2025Updated 5 months ago
- ☆25Apr 23, 2022Updated 3 years ago
- MachineHack is an online platform for Machine Learning competitions. We host toughest business problems that can now find solutions in Ma…☆19Oct 25, 2023Updated 2 years ago
- MongoDB_Study_Materials☆34Jul 28, 2023Updated 2 years ago
- ☆33Updated this week
- Tutorial for creating a simple storage contract using Ethers.☆21Aug 28, 2022Updated 3 years ago
- This repo is for the Linkedin Learning course: End-to-End Data Engineering Project☆29Nov 9, 2023Updated 2 years ago
- ☆30Jul 18, 2022Updated 3 years ago
- Simple way to send ether.☆24Nov 23, 2020Updated 5 years ago
- Learn Power BI, second edition, published by Packt.☆40Feb 5, 2026Updated last month
- ☆119Dec 21, 2025Updated 2 months ago
- pandas, numpy, matplotlib, data-wrangling☆38Updated this week
- ☆37Aug 11, 2024Updated last year
- This repo is for the Linkedin Learning course: Advanced AI: Transformers for Computer Vision☆38Jun 21, 2025Updated 8 months ago
- iTASK - Intelligent Traffic Analysis Software Kit☆29Dec 8, 2022Updated 3 years ago
- ☆42Oct 15, 2024Updated last year
- Disease Prediction based on Symptoms.☆334Feb 15, 2023Updated 3 years ago
- ☆41Apr 10, 2024Updated last year
- MLflow related work☆40Sep 11, 2023Updated 2 years ago
- Fnug runs all your lints, tests and commands at once, in the terminal. With git integration and file watching☆54Feb 25, 2026Updated last week
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆48Dec 4, 2023Updated 2 years ago
- Streaming Anomaly Detection Solution by using Pub/Sub, Dataflow, BQML & Cloud DLP☆192Jan 5, 2026Updated 2 months ago