A complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.
☆19Oct 16, 2019Updated 6 years ago
Alternatives and similar repositories for Wikipedia-Search-Engine
Users that are interested in Wikipedia-Search-Engine are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Playground for pyspark (RDDs, DStreams) and Apache Airflow. Based on the example of parsing (including incorrectly formated strings) web …☆18Feb 21, 2022Updated 4 years ago
- Data warehouse implementation for an e-commerce website “Infibeam” that sells digital and consumer electronics.☆23Jan 28, 2018Updated 8 years ago
- The smart city reference pipeline shows how to integrate various media building blocks, with analytics powered by the OpenVINO™ Toolkit, …☆218May 5, 2025Updated last year
- This is the LinkedIn Learning repository for Level Up: Python Data Acquisitions, Prep, & EDA.☆15Mar 4, 2025Updated last year
- An end-to-end ETL pipeline that extracts weather data, transforms it, and loads it into a PostgreSQL database.☆14Sep 6, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- A BigQuery adapter for Harlequin, a SQL IDE for the terminal.☆10Jan 19, 2025Updated last year
- Handy Reusable Utilities☆22Nov 13, 2021Updated 4 years ago
- Python Essentials for AWS Cloud Developers, published by Packt.☆12Apr 27, 2023Updated 3 years ago
- Streamlit Dashboard over Superstore Data stored in Postgres Docker container. With SQLAlchemy + Plotly Express☆12Oct 16, 2024Updated last year
- This is a guided certification project, as a part of Data Science for Social Good initiative☆18Mar 9, 2020Updated 6 years ago
- This repository contains code to build an MVP search engine with google like interface.☆17Mar 25, 2026Updated 2 months ago
- Case Studies and Projects in Machine Learning/EDA/DL☆24Jun 18, 2024Updated last year
- weighted category-balanced dataset builder for LLM fine-tuning☆16Feb 21, 2026Updated 3 months ago
- ☆24Jan 6, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- This repo is for linkedin learning course: Complete Guide to SQL for Data Engineering: from Beginner to Advanced☆47Mar 20, 2025Updated last year
- my zsh configuration☆13Jun 26, 2025Updated 11 months ago
- iTASK - Intelligent Traffic Analysis Software Kit☆30Dec 8, 2022Updated 3 years ago
- ☆17Feb 11, 2022Updated 4 years ago
- Data visualisations in Power BI☆31Nov 14, 2021Updated 4 years ago
- RealTime StockStream is a streamlined, simulation system for processing live stock market data. It uses Apache Kafka for data input, Apac…☆31Feb 18, 2025Updated last year
- ☆30Jan 17, 2023Updated 3 years ago
- A program written in C++ that emulates a bogus CPU☆22May 6, 2024Updated 2 years ago
- This repo is for the Linkedin Learning course: End-to-End Data Engineering Project☆32Nov 9, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This dataset contain information of hotel booking, We have performed exploratory data analysis in python to get insight from the data.☆13Apr 12, 2020Updated 6 years ago
- Video surveillance units are usually the first element of a security system. While they are the most intuitive to understand and can be p…☆36Oct 18, 2014Updated 11 years ago
- ☆16Jul 31, 2022Updated 3 years ago
- Disease Prediction based on Symptoms.☆342Feb 15, 2023Updated 3 years ago
- The AI-powered CLI Assistant☆30May 24, 2024Updated 2 years ago
- An ETL pipeline that extracts weather and air quality data from public APIs, transforms the data into a clean, analyzable format, and loa…☆45Sep 21, 2024Updated last year
- Complete PySpark Guide for the beginners... I prepared this notebook for my students.☆19Sep 18, 2019Updated 6 years ago
- MLgenerator is a web app which help you to generate machine learning starter code with ease.☆34Feb 5, 2021Updated 5 years ago
- Streaming Anomaly Detection Solution by using Pub/Sub, Dataflow, BQML & Cloud DLP☆191Jan 5, 2026Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- MachineHack is an online platform for Machine Learning competitions. We host toughest business problems that can now find solutions in Ma…☆19Oct 25, 2023Updated 2 years ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆51Dec 4, 2023Updated 2 years ago
- ☆42Apr 10, 2024Updated 2 years ago
- Bootcamp to learn basics in Data Engineering☆40Mar 28, 2021Updated 5 years ago
- Beyond the basics - Learn Spark☆28Oct 15, 2017Updated 8 years ago
- This repo is the solution for landslide4Sense challenge.☆55Jul 1, 2022Updated 3 years ago
- ☆29Jun 2, 2024Updated last year